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Ep s(X) See (6.8), p. 254 

Ep p(X) See (6.9), p. 254 

Ep ,(X) See (6.10), p. 254 

Ep p(X) See (6.11), p. 254 

E,3(X) See (6.12), p. 254 

Kigal FH) Kernel of E,x, p. 256 

Mox(H) Quotient matrix space M(H) /K,.(H), p. 256 
M (m) (H) Image of the map E,. ({X € M(H)|P,X = X}), p. 257 
Po Projection to the range of p, p. 257 

(908 See (6.13), p. 254 

XII See (6.14), p. 255 

(A, ie See (6.18), p. 255 

he See (6.19), p. 255 

Kp,x See (6.21), p. 256 

Lox e representation based on inner product x (6.30), p. 260 
Los SLD e representation (6.31), p. 260 

Lop Bogoljubov e representation, p. 261 

Lo» RLD e representation (6.31), p. 260 

TT? .£9 SLD e parallel transport, p. 266 

Ti Po Bogoljubov e parallel transport, p. 266 

TT? 09 RLD e¢ parallel transport, p. 266 

Rex Real part of matrix X, p. 3 

ImXx Imaginary part of matrix X, p. 3 

Vo(X) Matrix with components (Tr p,X'X’), p. 284 


Apo Relative modular operator, p. 290 
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Error Criteria 
Vo (M", An) 

Vo (M", ) 
Vo({M", ,}) 


({(M", ,)}.8€) 
a({(M",b,)}.8) 


Notations 


Mean square error (MSE) of estimator (M”, cm) (6.71), p. 273 
Mean square error matrix (6.94), p. 281 


Matrix of components V;’({M", 6, ye elim nV JM", An), 
p. 281 

Rate function of error probability (6.84), p. 276 
First-order coefficient of rate function (6.86), p. 278 


Disturbances and Uncertainties 


ne p) 
A2(M, p) 
A.M, X, p) 

. x Pp) 
A4(«,X, p) 

e(p,k) 


Uncertainty of an observable (7.12), p. 330 

Uncertainty of a measurement (7.13), p. 330 

Deviation of POVM M from observable X (7.17), p. 330 
Disturbance of X caused by « (7.23), p. 332 

Disturbance of X caused by « (7.25), p. 332 

Amount of state reduction by x (7.59), p. 342 


Information Quantities of q-q Channel « and State p 


F.(p, kK) 
F.(p, K) 
a k) 
I-(p,K) 
I.(p, k) 
A.(k, p 
X«(P) 
Ai.(p) 


) 


Entanglement fidelity for TP-CP « (8.18), p. 365 
Entanglement fidelity for an instrument (8.29), p. 366 
Transmission information of q-q channel x (8.35), p. 370 
Coherent information (8.37), p. 370 

Pseudocoherent information (8.48), p. 372 

Entropy exchange H((k ® tr) (|x)(x|)), p. 372 

Holevo information (9.6), p. 494 

Minimum average output entropy (9.7), p. 494 


Class of Local Operations (C =) 


) 


PPT 


Only local operations, p. 360 

Local operations and zero-rate classical communications 
from A to B, p. 403 

Local operations and classical communications from A to B, 
p. 360 

Local operations and classical communications from B to A, 
p. 360 

Local operations and two-way classical communications 
between A and B, p. 360 

Separable operations, p. 361 

Positive partial transpose (PPT) operations, p. 418 
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Entanglement Measures 


Esq(0) 
E.(p) 


E;(p) 
E,.s(p) 


EX (0) 


Ey + s|s(P) 


Ey 4 sjs(P) 
E,.ppt(p) 


Espp(p) 
E\ +s|spp(p) 


Ey +s| spp(P) 


Squashed entanglement (8.127), p. 394 

Entanglement of cost with zero-rate communication (8.161), 
p. 403 

Entanglement of formation (8.97), p. 387 

Entanglement of relative entropy with separable states 
minges D(p||o) (8.77), p. 383 

Asymptotic entanglement of relative entropy with separable 
states (8.82), p. 383 

Entanglement of relative Rényi entropy with separable states 
minges D1 +5(pl|o) (8.133), p. 396 

Entanglement of relative Rényi entropy with separable states 
Minges D)+,(pl|o) (8.134), p. 396 

Entanglement of relative entropy with PPT states 

min,-ppr D(p||o), p. 418 

SDP bound min, D(p||o) + log ||t4(c)||,, p. 418 

SDP bound with relative Rényi entropy (8.244) 
min, D; +s(p||o) + log ||r4(o)||,, p. 425 

SDP bound with relative Rényi entropy (8.245) 
min, Dy , «(pllo) + log ||r*(o) |}, p. 425 

Entanglement of purification (8.164), p. 404 

Logarithm of Schmidt rank (8.113), p. 391 

Concurrence (8.317), p. 444 


Na 
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Operational Entanglement Measure with Class C 


Entanglement of distillation (8.72), p. 382 
Strong converse entanglement of distillation (8.73), p. 382 


ntanglement of distillation (8.75), p. 382 
Strong converse entanglement of distillation (8.76), p. 382 


Asymptotic entanglement of exact distillation (8.89), p. 386 
E ntanglement of exact distillation (8.89), p. 386 


Exponential decreasing rate for entanglement of distillation 
(8.90), p. 386 

Entanglement of cost (8.107), p. 390 

Asymptotic entanglement of exact cost (8.112), p. 391 
Entanglement of exact cost (8.112), p. 391 


Exponential decreasing rate for entanglement of cost (8.91), 
p. 386 
Maximum of negative conditional entropy (8.119), p. 393 
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Eee Jatt p) Maximum of negative conditional Rényi entropy (8.131), 
p. 396 

ET sal p) Maximum of negative conditional Rényi entropy (8.132), 
p. 396 

ES, (0) Conclusive teleportation fidelity (8.88), p. 385 


Security Measures 


d\(A : Elp) Measure for independence ||o — p4 ® pg||, (8.283), p. 433 

F(A: Elp) Measure for independence F(p, p4 ® pg) (8.284), p. 433 

(A : E) Measure for independence and uniformity D(p]|| mix 4 ® Pz) 
(8.285), p. 434 

dy(A : Elp) Measure for independence and uniformity || — Ppix 4 © 
prl|; (8.287), p. 434 

F'(A: Elp) Measure for independence and uniformity F'(p, Pmix.4 © Pr) 


(8.288), p. 434 


Other Types of Correlation 


Cn Measure of classical correlation (8.170), p. 407 

D(BIA), Discord J,(A : B) — C4~8(p) (8.177), p. 408 

C.(p) See (8.198), p. 413 

C(p, 4) See (8.200), p. 413 

C(p,6) See (8.201), p. 413 

C(p) C(p,0) = C(p,0) (8.203), p. 413 

Creep) Optimal generation rate of secret key with one-way 
communication (9.82), p. 521 

Oy maeame (| See (9.83), p. 521 


Notations for Bipartite System 


Hs Symmetric space, p. 408 
Ha Antisymmetric space, p. 408 
F Flip operator P; — Py, p. 408 


Entangled States 


|®,)(®,| Maximally entangled state of size L, p. 360 
Ou Maximally correlated state (8.142), p. 398 
Pwo Werner state (8.323), p. 445 


Pip Isotropic state (8.328), p. 447 
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Channel Capacities 


C-(K) 
Ce(k) 
Ce?) 


Ce e(K) 
C,(W, oc) 
C8 (W) 
Ca 

Cq2 

cl o(1) 
Cspp(k) 
C..(W) 


C2,(W) 


car 


Coe(W) 


Cee(W) 


Ce e(K) 


Co8 (Kk) 


Corl) 


Cc 


CeR(K) 


Classical capacity without entangled input states (9.1), p. 493 
Classical capacity with entangled input states (9.2), p. 493 
Amount of assistance for sending information by state op“? 
(9.37), p. 502 

Entanglement-assisted classical capacity (9.42), p. 505 
Quantum-channel resolvability capacity (9.57), p. 511 
Wiretap channel capacity (9.73), p. 517 

Quantum capacity in worst case (9.101), p. 527 

Quantum capacity with entanglement fidelity (9.101), p. 527 
Strong converse quantum capacity (9.122), p. 535 


SDP bound (9.127), p. 536 

Channel capacity for sending classical information with 
shared randomness (10.83) 

Reverse channel capacity for sending classical information 
with shared randomness (10.82), p. 594 

Channel capacity for sending classical information with 
shared entanglement, p. 596 

Reverse channel capacity for sending classical information 
with shared entanglement, p. 596 

Channel capacity for sending classical information with 
shared entanglement and entangled input, p. 505 

Reverse channel capacity for sending classical information 
with shared entanglement and entangled input, p. 596 
Channel capacity for sending classical information with 
shared randomness and entangled input, p. 505 

Reverse channel capacity for sending classical information 
with shared randomness and entangled input, p. 597 
Channel capacity for sending quantum states with shared 
entanglement and entangled input, p. 597 

Reverse channel capacity for sending quantum states with 
shared entanglement and entangled input, p. 597 

Channel capacity for sending quantum states with shared 
randomness and entangled input, p. 597 

Reverse channel capacity for sending quantum states with 
shared randomness and entangled input, p. 597 


Minimum Compression Rates 


Req(p, Ww) 


Rya( ) Ww) 


Minimum compression rate in blind and ensemble setting 
(10.4), p. 572 

Minimum compression rate in visible and ensemble setting 
(10.5), p. 572 


XXX 


Rpq(p) 

Rh (pW) 
Rh q(P.W) 
Rp g(0) 

Rv c(p, W) 
Rvar(p, W) 
Ryer(p,W) 


Notations 


Minimum compression rate in purification setting (10.15), 
p. 573 

Strong converse compression rate in blind and ensemble 
setting (10.6), p. 572 

Strong converse compression rate in visible and ensemble 
setting (10.7), p. 572 

Strong converse compression rate in purification setting 
(10.16), p. 573 

Minimum visible compression rate with classical memory 
(10.60), p. 587 

Minimum visible compression rate with quantum memory 
and shared randomness (10.72), p. 591 

Minimum visible compression rate with classical memory 
and shared randomness (10.73), p. 591 


Codes for Quantum Source Coding 


SEE ES 


Blind code, p. 571 

Visible code, p. 571 

Visible code by classical memory, p. 586 

Visible code with common randomness, p. 590 

Visible code with common randomness by classical memory, 
p. 591 
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Invitation to Quantum Information Theory 


Understanding the implications of recognizing matter and extracting information 
from it has been a long-standing issue in philosophy and religion. However, 
recently this problem has become relevant to other disciplines such as cognitive 
science, psychology, and neuroscience. Indeed, this problem is directly relevant to 
quantum mechanics, which forms the foundation of modern physics. In the process 
of recognition, information cannot be obtained directly from matter without any 
media. To obtain information, we use our five senses; that is, a physical medium is 
always necessary to convey information to us. For example, in vision, light works 
as the medium for receiving information. Therefore, observations can be regarded 
as information processing via a physical medium. Hence, this problem can be 
treated by physics. Of course, to analyze this problem, the viewpoint of information 
science is also indispensable because the problem involves, in part, information 
processing. 

In the early twentieth century, physicists encountered some unbelievable facts 
regarding observations (measurements) in the microscopic world. They discovered 
the contradictory properties of light, i.e., the fact that light has both wave- and 
particle-like properties. Indeed, light behaves like a collection of minimum energy 
particles called photons. In measurements using light, we observe the light after 
interactions with the target. For example, when we measure the position of the 
matter, we detect photons after interactions with them. Since photons possess 
momentum and energy, the speed of the object is inevitably disturbed.' In partic- 
ular, this disturbance cannot be ignored when the mass of the measured object is 
small in comparison with the energy of the photon. Thus, even though we measure 
the velocity of an object after the measurement of its position, we cannot know the 
velocity of an object precisely because the original velocity has been disturbed by 


'The disturbance of measurement is treated in more detail in the formulation of quantum 
mechanics in Chap. 7. 
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the first measurement. For the same reason, when we measure the velocity first, its 
position would be disturbed. Therefore, our naive concept of a “perfect measure- 
ment” cannot be applied, even in principle. In the macroscopic world, the mass 
of the objects is much larger than the momentum of the photons. We may therefore 
effectively ignore the disturbance by the collisions of the photons. Although we 
consider that a “perfect measurement” is possible in this macroscopic world, the 
same intuition cannot be applied to the microscopic world. 

In addition to the impossibility of “perfect measurements” in the microscopic 
world, no microscopic particles have both a determined position and a determined 
velocity. This fact is deeply connected to the wave-particle duality in the micro- 
scopic world and can be regarded as the other side of the nonexistence of “perfect 
measurements.”” Thus it is impossible to completely understand this microscopic 
world based on our macroscopic intuitions, but it is possible to predict proba- 
bilistically its measured value based on the mathematical formulation of quantum 
theory. 

So far, the main emphasis of quantum mechanics has been on examining the 
properties of matter itself, rather than the process of extracting information. To 
discuss how the microscopic world is observed, we need a quantitative consider- 
ation from the viewpoint of “information.” Thus, to formulate this problem clearly, 
we need various theories and techniques concerning information. Therefore, the 
traditional approach to quantum mechanics is insufficient. On the other hand, 
theories relating to information pay attention only to the data-processing rather than 
the extraction process of information. Therefore, in this quantum-mechanical 
context, we must take into account the process of obtaining information from 
microscopic (quantum-mechanical) particles. We must open ourselves to the new 
research field of quantum information science. This field is to be broadly divided 
into two parts: (1) quantum computer science, in which algorithms and complexity 
are analyzed using an approach based on computer science, and (2) quantum 
information theory, in which various protocols are examined from the viewpoint of 
information theory and their properties and limits are studied. Specifically, since 
quantum information theory focuses on the amount of accessible information, it can 
be regarded as the theory for quantitative evaluation of the process of extracting 
information, as mentioned above. 

Since there have been only a few textbooks describing the recent developments 
in this field [1, 2], the present textbook attempts to provide comprehensive infor- 
mation ranging from the fundamentals to current research. Quantum computer 
science is not treated in this book because it has been addressed in many other 
textbooks. Since quantum information theory forms a part of the basis of quantum 
computer science, this textbook may be useful for not only researchers in quantum 
information theory but also those in quantum computer science. 


*The relation between this fact and nonexistence can be mathematically formulated by (7.27) and 


(7.30). 
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History of Quantum Information Theory in Twentieth Century 


Although quantum information theory has been very actively studied in the twenty 
first century, the root can be traced to the studies in the twentieth century. Let us 
briefly discuss the history of quantum information theory in the twentieth century. 
Quantum mechanics was first formulated by Schrédinger (wave mechanics) and 
Heisenberg (matrix mechanics). However, their formulations described the 
dynamics of microscopic systems, but they had several unsatisfactory aspects in 
descriptions of measurements. Later, the equivalence between both formulations 
were proved. To resolve this point, von Neumann [3] established the formulation of 
quantum theory that describes measurements as well as dynamics based on operator 
algebra, whose essential features will be discussed in Chap. 1. However, in studies 
of measurements following the above researches, the philosophical aspect has been 
emphasized too much, and a quantitative approach to extracting information via 
measurements has not been examined in detail. This is probably because approa- 
ches to mathematical engineering have not been adopted in the study of 
measurements. 

In the latter half of the 1960s, a Russian researcher named Stratonovich, who is 
one of the founders of stochastic differential equations, and two American 
researchers, Helstrom and Gordon, proposed a formulation of optical communi- 
cations using quantum mechanics. This was the first historical appearance of 
quantum information theory. Gordon [4, 5], Helstrom [6], and Stratonovich [7] 
mainly studied error probabilities and channel capacities for communications. 
Meanwhile, Helstrom [8] examined the detection process of optical communication 
as parameter estimation. Later, many American and Russian researchers such as 
Holevo [9, 10], Levitin [11], Belavkin [12], Yuen [13], and Kennedy [14] also 
examined these problems.’ In particular, Holevo obtained the upper bound of the 
communication speed in the transmission of a classical message via a quantum 
channel in his two papers [9, 10] published in the 1970s. Further, Holevo [16, 18], 
Yuen [13], Belavkin, and their coworkers also analyzed many theoretically 
important problems in quantum estimation. 

Unfortunately, the number of researchers in this field rapidly decreased in the 
early 1980s, and this line of research came to a standstill. Around this time, Bennett 
and Brassard [19] proposed a quantum cryptographic protocol (BB84) using a 
different approach to quantum mechanical systems. Around the same time, Ozawa 
[20] gave a precise mathematical formulation of the state reduction in the mea- 
surement process in quantum systems. 


3Other researchers during this period include Grishanin, Mityugov, Kuriksha, Liu, Personick, Lax, 
Lebedev, Forney [15] in the United States and Russia. Many papers were published by these 
authors; however, an accurate review of all of them is made difficult by their lack of availability. In 
particular, while several Russian papers have been translated into English, some of them have been 
overlooked despite their high quality. For details, see [16, 17]. 
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In the latter half of the 1980s, Nagaoka investigated quantum estimation theory 
as a subfield of mathematical statistics. He developed the asymptotic theory of 
quantum-state estimation and quantum information geometry [21]. This research 
was continued by many Japanese researchers, including Fujiwara, Matsumoto, and 
the present author in the 1990s [22-39]. For this history, see Hayashi [40]. 

In the 1990s, in the United States and Europe several researchers started 
investigating quantum information processing, e.g., quantum data compression, 
quantum teleportation, superdense coding, another quantum cryptographic protocol 
(B92), etc. [41-46]. In the second half of the 1990s, the study of quantum infor- 
mation picked up speed. In the first half of the 2000s, several information-theoretic 
approaches were developed, and research has been advancing at a rapid pace. 

We see that progress in quantum information theory has been achieved by 
connecting various topics. This text clarifies these connections and discusses cur- 
rent research topics starting with the basics. 


Structure of the Book 


Quantum information theory has been studied by researchers from various back- 
grounds. Their approach can be broadly divided into two categories. The first 
approach is based on information theory. In this approach, existing methods 
for information processing are translated (and extended) into quantum systems. The 
second approach is based on quantum mechanics. 

In this text, four chapters are dedicated to examining problems based on the first 
approach, i.e., establishing information-theoretic problems. These are Chap. 3, 
“Quantum Hypothesis Testing and Discrimination of Quantum States,” Chap. 4, 
“Classical Quantum Channel Coding (Message Transmission),” Chap. 6, “Quantum 
Information Geometry and Quantum Estimation,” and Chap. 10, “Source Coding in 
Quantum Systems.” Problems based on the second approach is treated in three 
chapters: Chap. 5, “State evolution and Trace-Preserving Completely Positive 
Maps,” Chap. 7, “Quantum measurements and State Reduction,” and Chap. 8, 
“Entanglement and Locality Restrictions.” 

Advanced topics in quantum communication such as quantum teleportation, 
superdense coding, quantum-state transmission (quantum error correction), and 
quantum cryptography are often discussed in quantum information theory. Both 
approaches are necessary for understanding these topics, which are covered in 
Chap. 9, “Analysis of Quantum Communication Protocols.” 

Some quantum-mechanical information quantities are needed to handle these 
problems mathematically, and these problems are covered in Sects. 3.1, 5.4, 5.5, 5.6, 
8.2, and 8.3. This allows us to touch upon several important information-theoretic 
problems using a minimum amount of mathematics. The book also includes 450 
exercises together with solutions. Solving these problems should provide readers not 
only with knowledge of quantum information theory but also the necessary tech- 
niques for pursuing original research in the field. 
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Chapter | covers the mathematical formulation of quantum mechanics in the 
context of quantum information theory. It also gives a review of linear algebra. 
Chapter 2 summarizes classical information theory. This not only provides an 
introduction to the later chapters but also serves as a brief survey of classical 
information theory. This chapter covers entropy, Fisher information, information 
geometry, estimation of probability distribution, large deviation principle. Also, it 
discusses the axiomatic characterization of entropy. This concludes the preparatory 
part of the text. Section 2.6 treats the large deviation on the sphere, which is used 
only in Sect. 8.13. So, a reader can skip it before stating Sect. 8.13. 

Chapter 3 covers quantum hypothesis testing and the discrimination of quantum 
states. This chapter starts with introduction of information quantities in quantum 
systems. Then, this chapter serves to answer the question: If there are two states, 
which is the true state? The importance of this question may not at first be apparent. 
However, this problem provides the foundation for other problems in information 
theory and is therefore crucially important. Also, this problem provides the basic 
methods for quantum algorithm theory. Many of the results of this chapter will be 
used in subsequent chapters. In particular, the quantum version of Stein’s lemma is 
discussed here; it can be used a basic tool for other topics. Furthermore, many of the 
difficulties associated with the noncommutativity of quantum theory can be seen 
here in their simplest forms. This chapter can be mainly read after Chap. 1 and 
Sects. 2.1 and A.3. 

Chapter 4 covers classical quantum channel coding (message transmission). That 
is, we treat the tradeoff between the transmission speed and the error probability in 
the transmission of classical messages via quantum states. In particular, we discuss 
the channel capacity, i.e., the theoretical bound of the transmission rate when the 
error probability is 0, as well as its associated formulas. This chapter can be read 
after Chap. 1 and Sects. 2.1, 3.1, 3.5, 3.7, and 3.8. 

Chapter 5 discusses the trace-preserving completely positive map, which is the 
mathematical description of state evolution in quantum systems. Its structure will be 
illustrated with examples in quantum two-level systems. We also briefly discuss the 
relationship between the state evolution and information quantities in quantum 
systems (the entropy and relative entropy). In particular, the part covering the 
formulation of quantum mechanics (Sects. 5.1—5.3) can be read after only Chap. 1. 

Chapter 6 describes the relation among quantum information geometry, quantum 
information quantities, and quantum estimation. First, the inner product for the 
space of quantum states is briefly discussed. Next, we discuss the geometric 
structure naturally induced from the inner product. The theory of state estimation in 
quantum systems is then discussed by emphasizing the Cramér—Rao inequality. 
Most of this chapter except for Sect. 6.7 can be read after Chaps. 1 and 2 and 
Sect. 5.1. Section 6.7 can be read after Chap. | and Sects. 5.1, 5.4, and 6.1. 

Chapter 7 covers quantum measurement and state reduction. First, it is shown 
that the state reduction due to a quantum measurement follows naturally from the 
axioms of the quantum systems discussed in Chap. 1. Next, we discuss the relation 
between quantum measurement and two types of uncertainty relations, square error 
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type uncertainty and entropic uncertainty. Finally, it is shown that under certain 
conditions it is possible, in principle, to perform a measurement such that the 
required information can be obtained while the state demolition is negligible. 
Readers who only wish to read Sects. 7.1 and 7.4 can read them after Chap. 1 and 
Sect. 5.1. Section 7.2 requires the additional background of Sect. 6.1. Section 7.3 
can be read after Chap. 1 and Sects. 5.1, 5.4, 5.5, and 5.6. 

Chapter 8 discusses the relation between locality and entanglement, which are 
fundamental topics in quantum mechanics. First, we examine state operations when 
the locality condition is imposed on quantum operations. Next, the information 
quantities related to entanglement are considered. The theory for distilling a perfect 
entangled state from a partially entangled state is discussed. Information-theoretic 
methods play a central role in entanglement distillation. Quantification of entan- 
glement is discussed from various viewpoints. As opposite task, we discuss the 
entanglement of dilution, which evaluates the cost to generate a given partially 
entangled state. While this task is characterized by using the entanglement for- 
mation, we discuss the nonadditivity of this quantity. As another types of corre- 
lation, we discuss discord. Further, we consider the duality of conditional entropy, 
secure random number generation, and state generation from shared randomness. 

Chapter 9 delves deeply into topics in quantum channels such as quantum 
teleportation, superdense coding, quantum-state transmission (quantum error cor- 
rection), and quantum key distribution based on the theory presented in previous 
chapters. These topics are very simple when noise is not present. However, if noise 
is present in a channel, these problems require the information-theoretic methods 
discussed in previous chapters. The relationship among these topics is also dis- 
cussed. Further, the relation between channel capacities and entanglement theory is 
also treated. The additivity problem for the classical-quantum channel capacity is 
discussed in Sects. 8.13 and 9.2. 

Finally, Chap. 10 discusses source coding in quantum systems. We treat not only 
the theoretical bounds of quantum fixed-length source coding but also universal 
quantum fixed-/variable-length source coding, which does not depend on the form 
of the information source. The beginning part of this chapter, excepting the purifi- 
cation scheme, requires only the contents of Chaps. 1 and 2 (Sects. 2.1-—2.4) and 
Sect. 5.1. In particular, in universal quantum variable-length source coding, a 
measurement is essential for determining the coding length. Hence this measurement 
causes the demolition of the state to be sent, which makes this a more serious 
problem. However, it can be solved by a measurement with negligible state 
demolition, which is described in Chap. 7. Then we treat quantum-state compression 
with mixed states and its several variants. The relations between these problems and 
entanglement theory are also treated. Further, we treat the relationships between the 
reverse Capacities (reverse Shannon theorem) and these problems. Excluding Sects. 
10.6—10.9, this chapter can be read after Chap. 1 and Sects. 2.1, 2.3, 3.1, 4.1, and 5.1. 

This text thus covers a wide variety of topics in quantum information theory. 
Quantum hypothesis testing, quantum-state discrimination, and quantum-channel 
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coding (message transmission) have been discussed such that only a minimal 
amount of mathematics is needed to convey the essence of these topics. Prior to this 
text, these topics required the study of advanced mathematical theories for quantum 
mechanics, such as those presented in Chap. 5. Further, Chaps. 5 (“State Evolution 
and Trace Preserving Completely Positive Maps in Quantum Systems”) and 7 
(“Quantum Measurement and State Reduction”) have been written such that they 
can be understood with only the background provided in Chap. |. Therefore, this 
text should also be suitable for readers who are interested in either the 
information-theoretic aspects of quantum mechanics or the foundations of quantum 
mechanics 
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Chapter 1 
Mathematical Formulation of Quantum 
Systems 


Abstract In this chapter, we cover the fundamentals of linear algebra and provide 
a mathematical formulation of quantum mechanics for use in later chapters. It is 
necessary to understand these topics since they form the foundation of quantum 
information processing discussed later. In the first section, we cover the fundamen- 
tals of linear algebra and introduce some notation. The next section describes the 
formulation of quantum mechanics. Further, we examine a quantum two-level sys- 
tem, which is the simplest example of a quantum-mechanical system. Finally, we 
discuss the tensor product and matrix inequalities. More advanced discussions on 
linear algebra are available in Appendix. 


1.1 Quantum Systems and Linear Algebra 


In order to treat information processing in quantum systems, it is necessary to mathe- 
matically formulate fundamental concepts such as quantum systems, measurements, 
and states. First, we consider the quantum system. It is described by a Hilbert space H 
(a finite- or infinite-dimensional complex vector space with a Hermitian inner prod- 
uct), which is called a representation space. Before considering other important 
concepts such as measurements and states, we give a simple overview of linear alge- 
bra. This will be advantageous because it is not only the underlying basis of quantum 
mechanics but is also as helpful in introducing the special notation used for quantum 
mechanics. In mathematics, a Hilbert space usually refers to an infinite-dimensional 
complex vector space with a Hermitian inner product. In physics, however, a Hilbert 
space also often includes finite-dimensional complex vector spaces with Hermitian 
inner products. This is because in quantum mechanics, the complex vector space with 
a Hermitian inner product becomes the crucial structure. Since infinite-dimensional 
complex vector spaces with Hermitian inner products can be dealt with analogously 
to the finite-dimensional case, we will consider only the finite-dimensional case in 
this text. Unless specified, the dimension will be labeled d. 

The representation space of a given system is determined by a physical observa- 
tion. For example, spin-4 particles such as electrons possess, an internal degree of 
freedom corresponding to “spin” in addition to their motional degree of freedom. 
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The representation space of this degree of freedom is C?. The representation space 
of a one-particle system with no internal degrees of freedom is the set of all square 
integrable functions from R° to C. In this case, the representation space of the system 
is an infinite-dimensional space, which is rather difficult to handle. Such cases will 
not be examined in this text. 

Before discussing the states and measurements, we briefly summarize some basic 
linear algebra with some emphasis on Hermitian matrices. This will be important 
particularly for later analysis. The Hermitian product of two vectors 


u v! 
2 2 

u= ~~ = EH 
ut vt 


is given by 
(u|v) Civ tew +t... tutte C, 
where the complex conjugate of a complex number x is denoted by x. The norm of 


the vector is given by ||u|| = / (u|u). The inner product of the vectors satisfies the 
Schwarz inequality 


I[zell lvl] = |(aelv) |. (1.1) 
When a matrix 
Se ees 
ee tn 
=| 3 ea UO?) 
pre ae 
satisfies the following condition 
Gig d al 
yh? 72.2 xd.2 
Xx=x' “ . ee (1.3) 
yd 3d | yd 


it is called Hermitian. We also define the complex conjugate matrix X and its trans- 
pose matrix X7 as follows: 
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~ xi xht x2! .. d,1 


Xx x ee x 
ee ere 7 def xh? 422 yh? 
x=]... We See, ag’ ge os. Te (1.4) 
41x42 yd Fe ase ad 
X+X 


Also, we denote the real part ~~ of matrix X and the imaginary part — of matrix 
X by ReX and ImX, respectively. Then, a Hermitian matrix X satisfies X7 = X. If 
a Hermitian matrix X satisfies (u|Xu) > O for an arbitrary vector u € H, it is called 
positive semidefinite and denoted by X > 0. If (u|Xu) > 0 for nonzero vectors u, X 
is called positive definite. The condition of positive semidefiniteness is equivalent 
to all the eigenvalues of a diagonalized Hermitian matrix X that are either zero or 
positive. As shown later, the trace of the product of two positive semidefinite matrices 
X and Y satisfies 


TrXY >0. (1.5) 


However, in general, the product XY is not a Hermitian matrix. Note that although 
the matrix XY + YX is Hermitian, it is generally not positive semidefinite. 

We can regard each element u € 1 as an element of the dual space 7(* according 
to the correspondence between 1 and H™* given by the inner product. We denote 
the corresponding element of the dual space 7(* by (u|, in accordance with the 
conventional notation in physics. If we wish to emphasize that u is an element not 
of H* but of H, we write |). That is, 


The Hermitian inner product (u|v) can also be considered as the matrix product of 
(u| and |v). Note that this notation is used in this text even if the norm of v is not 
equal to 1. On the other hand, the opposite matrix product |v) (u| is ad x d matrix: 


1 via vl... vl 
Ud gee Ss vu! vu... vu 
wul=| (ui Wests ul) = ees F (1.6) 
; Se ee 
v viu! vu... vu 


Although |Xv) = X|v), (Xv| = (v|X*. Evidently, if matrix X is Hermitian, then 
(u|Xv) = (Xu|v). This also equals Tr |v) (u|X, which is often denoted by (u|X|v). 
Using this notation, matrix X given by (1.2) may be written as X = dij x! |u;) (ujl, 
where u; is a unit vector whose ith element is 1 and remaining elements are 0. 
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A Hermitian matrix X may be transformed into the diagonal form U*XU by 
choosing an appropriate unitary matrix U. Since X = U(U*XU)U*, we may write 


1 1 1 it d 
Uy... Uy x O Uy... UY 
a a ae eae (et (1.7) 
d d d =a 
uy... UG O x 1 ons th 
Define d vectors uy, U2, ..., Ug 
uj 
uj; = : 
uf 
Then the unitarity of U implies that {u,, u2,...,uqa} forms an orthonormal basis, 


which will be simply called a basis latter. Using (1.6), the Hermitian matrix X may 
then be written as X = >’; x!|u;) (uj|. This process is called diagonalization. If X and 
Y commute, they may be written as X = 4“, x!|u;)(ui], Y =“, y'|ui) (ui| using 
the same orthonormal basis {u1, U2, ..., ug}. If X and Y do not commute, they cannot 
be diagonalized using the same orthonormal basis. 

Furthermore, we can characterize positive semidefinite matrices using this nota- 
tion. A matrix X is positive semidefinite if and only if x‘ > 0 for arbitrary i. Thus, 
this equivalence yields inequality (1.5) as follows: 


TrXY = Tr >) x'|uj)(ul¥ = > x! Te |i) (uilY¥ = >_ x! (ujl¥|u;) > 0. 


i=l i=l i=l 


We also define the commutator [X, Y] and symmetrized product X o Y of two 
matrices X and Y as! 


def def 


[X,Y] = xy—¥x, XoY | yy + YX). (1.8) 


“9 
Exercises 


1.1 Show Schwarz’s inequality (1.1) noting that (u + rev|u + rcv) > 0 for an arbi- 
trary real number r, where c = (v|u)/|(v|u)|. 


1.2 Suppose that k vectors u;,..., uz (uj = (u/)) satisfy (uj |u;) = a ui ui = 
dj <j, 7’ < k), where 6; ; is defined as 1 when j = j’ and as 0 otherwise. Show 
that there exist d — k vectors uz41,..., Ua such that(uj|uj) = 6);, ie., the matrix 


U = (uv) is unitary. 


'A vector space closed under commutation [X, Y] is called a Lie algebra. A vector space closed 
under the symmetrized product X o Y is called a Jordan algebra. 
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1.3 Let X = dij x! |u;) (u;| be a Hermitian positive semidefinite matrix. Show that 
the transpose iniatris X7 is also positive semidefinite. 


1.4 Let Xs, Yo be matrix-valued functions. Show that the derivative of the product 
(XoYo)' can be written as (Xp Yo)’ = Xo¥;, + X;,Yo, where X,, Y, are the derivatives 
of X», Yo, respectively. Show that the derivative of Tr Xo, i.e., (Tr X9)’, is equal to 
Tr(Xp). 


1.5 Let U bea unitary matrix and X be a matrix. Show that the equation (UXU*)* = 
UX* U* holds. Also give a counterexample of the equation (UXU*)? = UX7U*. 


1.2 State and Measurement in Quantum Systems 


To discuss information processing in quantum systems, we must first be able to 
determine the probability that every measurement outcome appears as an outcome 
of the measurement. Few standard texts on quantum mechanics give a concise and 
accurate description of the probability distribution of each measurement outcome. 
Let us discuss the fundamental framework of quantum theory so as to calculate the 
probability distribution of a measurement outcome. 

In the spin-4 system discussed previously, when the direction of “spin” changes, 
the condition of the particle also changes. In quantum systems, a description of the 
current condition of a system, such as the direction of the spin, is called a state. 
Any state is described by a Hermitian matrix p called a density matrix or simply 
density: 


Trp=1, p20. (1.9) 


Since quantum systems are too microscopic for direct observation, we must perform 
some measurement in order to extract information from the system. Such a mea- 


: . ae ; def ae 
surement is described by a set of Hermitian matrices M = {M.,}.,cq satisfying the 
following conditions: 


M,, > 0, Saat, 


wEesd 


where / denotes the identity matrix. Here, 2 forms a set of measurement outcomes 
w, and is called a probability space. The set M = {M.,}..<q is called a positive 
operator valued measure (POVM). For readers who have read standard texts on 
quantum mechanics, note that M,, is not restricted to projection matrices. When w is 
continuous, the summation >” is replaced by an integration { on the probability space 
§2. Here we denote the set of the measurement outcomes w by the probability spaces2 
and omit it if there is no risk of confusion. If a probability space {2 is a discrete set, 
then the number of elements of the probability space (2 is denoted by |M|. When 
the rank of M,, is | for any w € 92, the POVM M is called rank-one. 
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Fig. 1.1 Measurement Ww (Value) 
scheme 


2M 


(state) 


(Measurement) 


The density matrix p and the POVM M form the mathematical representations 
of a state of the system and the measurement, respectively, in the following sense 
(Fig. 1.1). If a measurement corresponding to M = {M.,}.,<q@ is performed on a sys- 
tem in a state corresponding to p, then the probability la (w) of obtaining w is” 


PY w@) = Trp, (1.10) 


The above definition satisfies the axioms of a probability since Tr pM. is positive 
and its summation becomes 1|, as follows. The inequality Tr pM,, > 0 can be verified 
by the fact that p and M.,, are both positive semidefinite. Furthermore, since 


>) TrpM, = Trp >) M, =Trpl =Trp=1, 


we wEeQ 


we see that this is indeed a probability distribution. In this formulation, it is implicitly 
assumed that both the state and the measurement are reproducible (otherwise, it would 
be impossible to verify experimentally (1.10)). For brevity, we shall henceforth refer 
to the system, state, and measurement by 1, p, and M, respectively. 

Let us now discuss the structure of a set of density matrices that shall be denoted 
by S(H). Consider a system that is in state p; with a probability \ and in state p2 with 
a probability 1 — A. Now, let us perform a measurement M = {M.,} on the system. 
The probability of obtaining the measurement outcome w is given by 


ATr pM, + (1 — A) Tr poM, = THA +1 —Ap2)Mu). AD) 


The state of the system may be considered to be given by p’ = Ap; + (1 — A)p2™*"". 
Thus, even by using (1.10) with this state and calculating the probability distributions 
of the measurement outcomes, we will still be entirely consistent with the experiment. 
Therefore, we may believe that the state of the system is given by p’. This is called 
a probabilistic mixture (or incoherent mixture). 


In quantum mechanics, one often treats the state after the measurement rather than before it. The 
state change due to a measurement is called state reduction, and it requires more advanced topics 
than those described here. Therefore, we postpone its discussion until Sect. 7.1. 
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In quantum mechanics, a state |u)(u| € S(H) represented by a vector u € H of 
norm | is called a pure state. This u is referred to as a state in the sense of |) (u|. The 
set of all vectors of norm 1 is written as H!. In contrast, when a state is not a pure 
state, it is called a mixed state. A pure state cannot be written as the probabilistic 
mixture of states except for itself '*. However, all mixed states may be written 
as the probabilistic mixture of other states, such as pure states. For example, if the 
dimensionality of 1 is d, then iI is a mixed state. In fact, it is called a completely 
mixed state and is written as Pix. 

On the other hand, when u,,..., ug form an orthonormal basis of 71, the vector 
|x) = pats |u;) is called a quantum-mechanical superposition of u;,..., ug. Note 
that this is different from the probabilistic mixture discussed above. The probabilistic 
mixture is independent of the choice of the orthonormal basis. However, the quantum- 
mechanical superposition depends on the choice of the basis, which depends on the 
physical properties of the system under consideration. 

When the operators M,, in a POVM M = {M.,} are projection matrices, i.e., 
M2 = M.,, the POVM is called a projection valued measure (PVM) (only PVMs 
are examined in elementary courses in quantum mechanics). This is equivalent to 
M.,M., = 0 for different w, w’*'!°. Hermitian matrices are sometimes referred to 
as “observables” or “physical quantities.” We now explain its reason. 

Let the eigenvalues of a Hermitian matrix X be x’, and the projection matrices 
corresponding to this eigenspace be Ey,;, ie.,.X = >", x'Ey,;. The right-hand side of 
this equation is called the spectral decomposition of X. The decomposition Ey = 
{Ex ;} is then a PVM. When more than one eigenvector corresponds to a single 
eigenvalue, the diagonalization X = a x!|u;) (u;| is not unique, while the spectral 
decomposition is unique. Suppose that a measurement corresponding to a PVM Ey 
applied the quantum system 7 with the state p. Then, by using (1.10), the expectation 
and variance of the measurement outcome may be calculated to Tr pX and Tr pX? — 
(Tr pX)*, respectively. Note that these are expressed completely in terms of X and 
p. Therefore, we identify the Hermitian matrix X as PVM Ey and refer to it as the 
measurement for the Hermitian matrix X. 

When two Hermitian matrices X and Y commute each other, we can use acommon 
orthonormal basis u1,..., uq and two sets of real numbers {x;}, {y;} to diagonalize 
the matrices as 


X = Dox!) (ull, Y = >oylui) (ul (1.12) 


Then, the observables X and Y can be measured simultaneously by using the PVM 
{|u;) (u;|};. Evidently, if all X,,..., X; commute, it is also possible to diagonalize 
them using the common basis. 

In general, the elements M_, of the PYM M = {M_,} and the state p do not nec- 
essarily commute. This noncommutativity often causes many difficulties in their 
mathematical treatment. To avoid these difficulties, we sometimes use the pinching 
map ky defined by®*'"' 
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kup) = >) MPM. (1.13) 


This is because the pinching map «ky modifies the state p such that the state becomes 
commutative with M,,. Hence, the pinching map is an important tool for overcoming 
the difficulties associated with noncommutativities. We often treat the case when 
PVM M is expressed as the spectral decomposition of a Hermitian matrix X. In such 
a case, we use the shorthand «x instead of Kg, . That is, 


ky = Kp, - (1.14) 


For a general POVM M, we may define™*""" 


ku(p) = >. /M.pV/M,,. (1.15) 


Note that this operation does not necessarily have the same effect as making the 
matrices commute. 


Exercises 


1.6 Show that when one performs a PVM Ex onasystem ina state p, the expectation 
and the variance are given by Tr pX and Tr pX? — (Tr pX)’, respectively. 


1.7 Show that Ap; + (1 — A)pz2 is a density matrix for A € (0, 1) when ; and p2 
are density matrices. 


1.8 Suppose that any pure state p is written as 9 = Ap; + (1 — A)po with a real 
number A € (0, 1) and two density matrices p; and p2. Show that p; = p2 = p. 


1.9 Let X and Y be positive semidefinite matrices. Show that XY = 0 if and only if 
TrXY = 0. 


1.10 Let M = {M,,} be a POVM. Show that M = {M.,,} is a PVM if and only if 
M.,M., = 0 for different w, w’. 


1.11 Show that >°, /M.,p./M., is a density matrix for a density matrix p and a 
POVM M. In particular, show that >°,, M,,eM,, is a density matrix when M is a 
PVM. 


1.3 Quantum Two-Level Systems 


A quantum system with a two-dimensional representation space is called a quantum 
two-level system or a qubit, which is the abbreviation for quantum bit. This is a 
particularly important special case for examining a general quantum system. The 
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spin-5 system, which represents a particle with a total angular momentum of 4, 
is the archetypical quantum two-level system. The electron is an example of such 
a spin-5 system. A spin-5 system precisely represents a specific case of angular 
momentum in a real system; however, it is sometimes referred to as any quantum 
system with two levels. In particle physics, one comes across a quantum system of 
isospin, which does not correspond to the motional degrees of freedom but to purely 
internal degrees of freedom. 

Mathematically, they can be treated in the same way as spin-5 systems. In this 
text, since we are interested in the general structure of quantum systems, we will use 
the term quantum two-level system to refer generically to all such systems. 

In particular, the Hermitian matrices So, S;, $2, and S3 given below are called 
Pauli matrices: 


01 0 -i 10 
So =1, s=(9o): v= ({ a) = (5.4). (1.16) 


They will help to simplify the expressions of matrices. The density matrix can be 
parameterized by using the Pauli matrices: 


3 
1f1+x x! -3i 1 F 
m= 5 (0 j=3° } 3 So + a : (1.17) 


which is called Stokes parameterization. The range of x = (x!, x”, x3) is the unit 
sphere {x| yy (x')? < 1}, which is called Bloch sphere (Fig. 1.2)®*'"”. 
We often focus on the basis 


def { 1 def { 0 
a= oO}? a= 1 


in the space C?. If there are several representation spaces H4, Hz, etc. equivalent to 
C?, and we wish to specify the space of the basis eo, e1, we will write é;: e4, etc. In 


Fig. 1.2 Bloch sphere 
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this case, the Pauli matrix S; and the identity matrix will be denoted by Se and J,, 
respectively. 

Next, we consider the measurements in a quantum two-level system. The mea- 
surement of the observable 5S) is given by the PVM Es, = {£, E_1}, where 


1/il Ly 1 ot 
B= 3(11): E1=3(1, '). 


Given a density matrix p,, the probability pi (1) of obtaining the measurement 


1 
outcome | is Tr pE) = os Similarly, the probability pi (—1) of obtaining the 


: Xi 
measurement outcome — | is 


. Amore detailed treatment of quantum two-level 


systems will be deferred until Sect. 5.3. 


Exercises 


1.12 Verify that the set {x = (x!, x7, x3)|p, > 0} is equal to {x| )2_, (x')? < 1} by 
showing det py = =e 
1.13 Verify that p, is a pure state if and only if ||x|| = 1. 


1.14 Show that all 2 x 2 Hermitian matrices with trace 0 can be written as a linear 
combination of S!', $2, and $3. 


1.4 Composite Systems and Tensor Products 


A combined system composed of two systems 7{4 and 7g is called the compos- 
ite system of 74 and 7,. When the system 7,4 (71g) has an orthonormal basis 
iGiciees ui} ({u?, Puog ll) respectively, the representation space of the com- 
posite system is given by the Hilbert space H4 ® Hg with the orthonormal basis 
{uj @uf,...,u{ @us ,uy @ul,...,uy @ul,..., ug, @ul,...,ug @ us}. The 
space H, ® Hg is called the tensor product space of 74 and 7/3; its dimension 
is d, x dg. Using d4 x dg complex numbers (z'/), the elements of H4, @ He may 


be written as 5°; ; z'/u/' @ u?. The tensor product of two vectors u4 = 5°, x*uj and 


u® = >", Wu? is defined as u* @ u? fy, >/x*y/up ® uP. We simplify this nota- 
tion by writing |u“ @ uv?) as |u4, uw). 

The tensor product X4 ® Xg of a matrix X4 on Ha, and a matrix Xg on 71g is 
defined as a matrix on 7H, ® He, by 


X4 @Xp(uj ® v) | Xa(uj) @ Xp(v) 
Exe. 1.15 


The trace of this tensor product satisfies the relation 


Tr X, @ Xz = TrX,- TrXz. (1.18) 


1.4 Composite Systems and Tensor Products ul 
Two matrices X4 and Y4 on 7,4 and two matrices Xg and Yzg on 7H, satisfy 
(X4 @ Xg)(Ya @ Yu) = (XaYa) @ (XB). 
Hence it follows that 
Tr(X4 @ XB)(V4 © Yu) = Tr(X4 V4) - Tr(Xp¥p). 


If the systems 74 and 7g are independent and their states are represented by the 
densities p4 and pg, respectively, then the state of the composite system may be 
represented by the tensor product of the density matrices p4 ® pg". Such a state 
is called a tensor product state. 

When the density matrix p on the composite system 7{4 ® 7/g can be written as 
a probabilistic mixture of tensor product states, it is called separable: 


P= > Pith @ pp. Pi=0, Dip =l, py € S(Ha), py © S(He). (1-19) 


Such separable states do not have a typical quantum-mechanical correlation (entan- 
glement)*. When p is a pure state |x) (x|, it is separable if and only if the vector |x) 
has a tensor product form |v‘, v2). When a state p does not have the form (1.19), it 
is called an entangled state. 

When all the n systems are identical to 71, their composite system is written as 
H ® --- @H:; this will be denoted by H®" for brevity. In particular, if all the quantum 
—$. , —_—_—4 


n 
systems are independent, and the state in each system is given by p, the composite 


state on H®” is p @ --- ® p, which is denoted by p®”. Such states can be regarded 
a 


n 
as quantum versions of independent and identical distributions (discussed later). 
Let us now focus on the composite state of the quantum two-level systems 71,4 
def . 
and Hg. By defining e}'? = B (ef @ ef + ef} @ e?), we see that |e?) (e}?| is not 
separable, i.e., it is an sianeid state. Other entangled states include 


Abe A 1 A B B 

(S* @ Ip)es? = ef @ete@e®), 

B "= 5 4 0 0 i) 

Abe A i B B 

(S$ @ Ip)eg cf @ && — A @ &), 

B "= 5 0 0 i) 

1 

Ab ES (54 @ Ipc? (cA @  — 4 @ 8). (1.20) 


= 9/0 


3In fact, even a separable state has a kind of quantum-mechanical correlation, which is called 
discord and is discussed in Sect. 8.10. This kind of correlation is measured by the quantity defined 
in (8.177). 
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They are mutually orthogonal, i.e., 
A,B, A,B 
(e. le, ) = OK (1.21) 


In general, any vector |x) on Ha ® Hg can be expressed as |x) = | D7, ,x'Yu} @ u?) = 
(X @ Ig)| 1“, u® @ uv?) by alinear map X = Di 2'/ ut) (uP | from Hg to H,. Since 
we can identify the vector |x) by using the linear map X, we denote it by |X).* So, 
we obtain the following properties™*'""'"*: 


(Y @ Z")|X) = |YXZ), (1.22) 
(Y|X) =TrY*X. (1.23) 


In particular, when /dX is a unitary matrix, |X) (X| is called a maximally entangled 
state of size d. Also, an entangled state is called a partially entangled state when 
it is not a maximally entangled state. In this book, we denote the vector lag! ) by 


|@,). Then, we have |u“, uv?) = |(\u)(w®|)). So, we find that the vector |X) has a 
tensor product form if and only if the matrix X is written with the form |“) (u?|. This 
condition is equivalent with the condition that the matrix X is a rank-one matrix. 
Next, let us consider the independent applications of the measurements 
Ma = {Maw huyen, and Mg = {MB uz}uzea, On systems 74 and 7g, respect- 


ively. This is equivalent to performing a measurement M, ® Mg = {Ma .u, ® 
MB wy} (wy ,wp)€ 4x Qy ON the composite system. Such a measurement is called an inde- 
pendent measurement. If a measurement M = {M.,}.,<g on the composite system 
Ha ® He has the form 


M. = Maw ® Mew, Maw = 0, Maw = 0 (1.24) 


or the form 


M, => Maui & Mp .w.i, Maui > 0, Mp. > 0, 


L 


the measurement M is said to be separable. Otherwise, it is called collective. Of 
course, independent measurements are always separable, but the converse is not 
always true. 

Since the vectors a os ae defined previously form an orthonormal basis 
in the composite system C* @ C’, the set {\e4’?) (ee |, ches \e3*) (ey? |} is a PVM. 
This measurement is a collective measurement because it does not have the separable 
form (1.24). 


4A notation similar to |X) was introduced in [1]. However, the relations (1.22) and (1.23) were 
essentially pointed out in [2]. 
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On the other hand, adaptive measurements are known as a class of separa- 
ble POVMs, and their definition is given as follows.> Suppose that we perform 
a measurement My = {My.u,}.,e9, On system 7, and then another measurement 
M a = {Mp4 }upeQ, ON System 7g according to the measurement outcome wy. The 
POVM of this measurement on the composite system H, ® 713 is given as 


{Ma wn ® M35} (waws)eQax Qe + (1.25) 


Such a measurement is called adaptive, and it satisfies the separable condition (1.24). 
Presently, it is not clear how different the adaptive condition (1.25) is from the 
separable condition (1.24). In Chaps. 3 and 4, we focus on the restriction of our 
measurements to separable or adaptive measurements and discuss the extent of its 
effects on the performance of information processing. 
Similarly, a separable measurement M = {M.,}.,<q inthe composite system 7H; © 
...@H, of n systems 7H1,..., Hy, is given by 


M., =M,,,,® -*-@Mnw, Miu = 0, wey Mis = 0. 
An adaptive measurement may be written in terms of a POVM as 


Ulf yaseg Wp— 
{My @:---®@ Mn, ‘wi gnats Wn EQ XX Ry + 


We also denote n applications of the POVM M on the composite system H®" by 
M®". 

Consider a composite system H,4 ® Hz, ina state p € S(H, ® Hg). Assume that 
we can directly access only system H, for performing measurements. In this case, 
we would only be interested in the state of system 7/4, and the density matrix on 71/4 
is given by the reduced density matrix Tr3,, ¢ € S(H,), which is defined to satisfy® 


Tr(Tr7, p)X = Tr(X @ In,)p- (1.26) 


We often abbreviate (Tr7,, 9) to pa. Then, Tr, can be regarded as a map from the 
density on the composite system to the reduced density matrix and called a partial 
trace, often abbreviated to Trg. To specify the space on which the trace is acting, we 
denote the trace by Tr, even if it is a full trace. The partial trace can be calculated 
according to 


> Adaptive measurements are often called one-way LOCC measurements in entanglement theory. 
See Sect. 8.1. 

©The uniqueness of this definition can be shown as follows. Consider the linear map X +> Tr(X @ 
I4,,)p on the set of Hermitian matrices on H,. Since the inner product (X, Y) +> Tr XY is non- 
degenerate on the set of Hermitian matrices on Ha, there uniquely exists a Hermitian matrix Y 
satisfying (1.26). 
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‘fi 
pi = (ub @ ub lolut @ ub), Trap =o p|uAyu|, == 1.27) 

k=l ij 
where the orthonormal basis of H4 (Hg) is uf, ...,u4 (uP, ..., u3,). This may also 


be written as 


Trp p= yD, pie) [uy (WA, (1.28) 
ij 
where p = >); ; np 0% C7) yA, us) (uj, uj;|. We may also write 
r 
Trg p= >) Tre PrpPx. (1.29) 


k=1 


where P; is a projection from H, ® He to Hs ® up , where we denote the linear 
space spanned by the vector ue by ae Further, for a given vector |v) € 7g, we use 
the notation 


(ulp|u) := Tre plu) (ul @ Ia. (1.30) 
Exercises 
1.15 Show (1.18). 
1.16 Show that (X @ Ig) ® Y) = Uy @ Y)(X @ Ip). 
1.17 Show (1.22). 
1.18 Show (1.23). 


1.19 Show that the following conditions for a state p on the composite system 
Ha ® He are equivalent. 


@ The state p has the tensor product form p, @ pz. 

@ Any normalized vector |v) € Hg satisfies that as = Trz p. 

@ Assume that the independent measurementM, ® M, of arbitrary POVMsM, = 
{Ma4..,} and Mp = {Mgz,..,,} is applied to the composite system Ha ® Hg with 


the state p. The measurement outcomes wa and wg are independent of each other. 


1.20 Suppose that the spaces 7{4 and Hz, also have other bases {vt iiss vi) and 
{v?, bes Un } and that the unitary matrices V4 = (v!) and Vg = (vp ) satisfy ui = -_ 
>, vu! and i => viu®. Show that vf @ ug = Des @ uj. Hence, the 


definition of the tensor ptotluct is independent of the choice of the bases on H, and 
Hp. 
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1.21 Prove (1.21). 
1.22 Prove formulas (1.27)—(1.29), which calculate the partial trace. 


1.23 Consider two Hermitian matrices p, > Oando, > 0 on H, and other two Her- 
mitian matrices pg > 0 and og > 0 on 7g. Show that the following two conditions 
are equivalent, following the steps below. 


© [pa ® pp, 74 @ Op] = 0. (1.31) 
o (Trogps)[pa, 04] = 0 and (Tr oypa)[pp, op] = 9. (1.32) 


(a) Show that (1.31) holds when [p4, 74] = [pg, a8] = 0. 
(b) Show that (1.31) holds when Tr pao, = 0. 

(c) Show that (1.32) > (1.31). 

(d) Show that (1.31) > (1.32). 


1.24 Show that Trg XU, ® Y) = Trg(,4 ® Y)X, where X is a matrix on Hy ® Hp 
and Y is a matrix on Hz. 


1.25 Further, show the following formula when p and pp are states on H, and Hz: 


Trg Vp ® polX, Y @ Ip] /p ® po 
=JAIT ts (In ® s/Po) X (In ® /P0)  V1YB- 


1.26 Let P be a projection from H, ® Hz to the subspace {u* ® u8|u? € Hp} for 
any element u“ € H,4. Show that 


Tr4(\u*) (u“| ® Ip)X = Tra PXP. 


1.5 Matrix Inequalities and Matrix Monotone Functions 


In later chapters, we will encounter quantities such as error probabilities that require 
us to handle inequalities in various situations. Of course, probabilities such as error 
probabilities are real numbers. However, in quantum systems these probabilities are 
expressed in terms of matrices, as we show in (1.10). Therefore, it is often helpful 
to use inequalities involving matrices when evaluating probabilities. By using the 
definition of positive semidefiniteness defined in Sect. 1.1, we may define the order 
(matrix inequality) 


yoy 27-720 (1.33) 
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for two Hermitian matrices X and Y™'?’, Such an order requires some care as it 
may involve some unexpected pitfalls arising from the noncommutativity of X and 
Y. In order to examine this order in greater detail, let us first analyze the properties 
of positive semidefiniteness again. Let X be ad x d positive semidefinite (>0) Her- 
mitian matrix and Y be ad x d’ matrix. It follows that Y*XY is a d' x d' positive 
semidefinite Hermitian matrix. This can be verified from 


(v|¥*XY|v) = (Yu|X|¥v) > 0, 


where v is a vector of length d’. Furthermore, if X; and X2 are two d x d Hermitian 
matrices satisfying X; > X2, it follows that 


Y*X,Y > Y*XY. (1.34) 


Now, we define another type of product 
1 
XoY:= pres 


If the matrices commute, then some additional types of matrix inequalities hold. 
For example, if d x d positive semidefinite Hermitian matrices X and Y commute, 
then®* 1.28 


XoY>0. (1.35) 


Inequality (1.35) does not hold unless X and Y commute. A simple counterexample 
exists for the noncommuting case™*'*’. 

Let X; and Xz be two d x d Hermitian matrices satisfying X; > X2 > 0, and Y 
be ad x d positive semidefinite Hermitian matrix. When Y is commutative with X; 
and X>, we have™*!” 


X| YX, > XYX>. (1.36) 


Inequality (1.36) does not hold unless all matrices commute™ '*'. In general, when 
noncommutativity is involved, matrix inequalities are more difficult to handle and 
should therefore be treated with care. 

Let us now define the projection {X > 0} with respect to a Hermitian matrix X 
with a spectral decomposition X = De xjEy,; (3): 


inseo 2 > Exi- (1.37) 


x;>0 


Consider the probability of the set {x; > 0} containing the measurement outcome 
for a measurement corresponding to the spectral decomposition {Ex ;} of X. This 
probability is °..) Tr pEx,; = Tr p{X = 0} when the state is given as a density p. 


x0 
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Therefore, this notation generalizes the concept of the subset to the noncommuting 
case. In other words, the probability Tr p{X > 0} can be regarded as a generalization 
of the probability p{w € 2|X(w) > 0}, where p is a probability distribution and X is 
a random variable. Then, we define X, as X{X > O}. Similarly, we may also define 
{X > O}, {X < O}, {X <0}, and {X ~ 0}. Further, given two Hermitian matrices 
X and Y, we define the projections {X > Y}, {X < Y}, {X < Y}, and {X # Y} as 
{X —Y > O}, {X — Y < 0}, {X —Y < 0}, and {X — Y $ 0}. Further, we define the 
matrix (X), := X{X > O}. 

If two Hermitian matrices X and Y commute, we obtain the matrix inequality 


Exe. 1.28 


{X > O} + (¥ >= O} > {(X+ Y > O} (1.38) 


in the sense defined above. The range of the projection {X 0} is called the support 
of X. If the projection {X 4 0} is not equal to /, then the matrix X does not have its 
inverse matrix. In this case, the Hermitian matrix Y satisfying XY = YX = {X 4 0} 
is called the generalized inverse matrix of a Hermitian matrix X. It should be 
noted that this is not generally true unless X and Y commute. It is known that two 
noncummutative Hermitian matrices X and Y cannot be diagonalized simultaneously. 
This fact often causes many technical difficulties in the above method. 

We now examine matrix monotone functions, which are useful for dealing with 
matrix inequalities. Given a function f, which maps a real number to a real number, 
we denote the Hermitian matrix 5°; f(x) Ex,; by f(X) with respect to a Hermitian 
matrix X = >”, x;Ey,i. 

f is called a matrix monotone function in S C Rif f(X) > f(Y) for two Hermitian 
matrices X and Y satisfying X > Y with eigenvalues S. Some known matrix monotone 
functions in [0, oo) are, for example, f(x) = x* (0 < s < 1), and those in (0, 00) are 
f() = logx and f(x) = —1/x [4]. See Exercise A.7 for the s = 1/2 case. Since the 
function f(x) = —x~* (0 <s < 1) isthe composite function of —1/x and x’, itis also 
a matrix monotone function. Note that the function f(x) = x* (s > 1) f@) = x, 
etc.) is not matrix monotone®™*'*”. 


Exercises 
1.27 Show that the order > defined in (1.33) satisfies the axiom of order, which is 
equivalent with the following conditions. 


(a) When X > Y and Y > Z, then X > Z. 
(b) When X > Y and Y > X, then X = Y. 


1.28 Suppose that X and Y commute. Show inequalities (1.35) and (1.38) using 
(1.12). 


1.29 Verify inequality (1.36) when Y is commutative with X; and X 


1.30 Show that X = © 0): Y= 5 ( ') form a counterexample to (1.35). 
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1.31 Show that X; =/, X> = € a Y= 4 (; ) form a counterexample to 
(1.36). 


1.32 Verify that the following X and Y provide a counterexample to f(x) = x? as a 
matrix monotone function: 


1.33 Show thatrank{X — xJ > 0} > rank P whena Hermitian matrix X, a projection 
P, and real number x satisfy X > xP. 


1.34 Show that Tr X > Tr|Y| for Hermitian matrices X and Y when X > Y and 
X>-Y. 


1.35 Show that Trf(X) > f((u1|X|u1)) +f C(u2|X|u2)) for a strictly convex func- 
tion f when X is a positive matrix on C? and X is not commutative the PVM 


{ur} (tr |, [M2) (uf. 


1.36 Let p = |w) (| be a pure state on H4 ® He. Show the following relation for 
a function f. 


Ff (pa) ® pl) = Ia @ f (0) 1) (1.39) 


1.37 Let p = |wW)(w| be a pure state on H, ® He such that 


I) = 2 Ailup, uP). 


Let V be the isometry >"; Ju‘) (uP |. Show the relation for a function f, a matrix X on 
Ha, and a Hermitian matrix Y on Hg. 


Tr piX pif (VYB')") = (|X @F (YW), (1.40) 


where ” is the transpose under the basis {|u‘')}. 


1.6 Solutions of Exercises 


Exercise |.1 Use the fact that the discriminant of (u + rcv|u + rcv) concerning r is 
negative. 


Exercise | .2 Consider the matrix A := pe |e;) (u;|, where |e;) is the vector that has 
the non-zero value | only in the i-th entry. The kernel of A has dimension d — k. So, 
we can choose such desired d — k vectors in the Kernel of A. 
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Exercise 1.3 Since X7 = Di alma) (ul, we have (x|Xx) = (x|X'x) > 0, where x = 
> xu; and ¥ = > xiuj. 
Exercise 1.4 Let (xp.;,;) and (y9.;,;) be the elements of X and Y,. Since the elements 


of Xp¥p are (S°, X0;i,4¥0;k,j), their derivatives are (D°, Xp. Voki + X0:i,4N).4,;)- Also, 
we have (Tr X9)! = (30; 6.1,1)' = (Di Gia) Tr(X;). 


Exercise 1.5 Consider the unitary U = (| and the matrix X = (; ae Then, 


T 
0 —id 0 ic 0 —ic 
*\T T]]* 
(UXU*) a 0 ) =( ia 6 )- However, UX U ae 0 ) 


Exercise 1.6 The expectation is >\,.x; Tr Exp = Tr pX. The variance is >”, x? Tr 
Ex 49 — Tr pX? = Tr pX? — (Tr pX). 
Exercise 1.7 Since p), p2 > 0, Ap; + (1 — A)po = 0. Also, Tr Ap; +  — A)p2 = 
ATr pi + (A — A) Tr po = 1. 
Exercise 1.8 Let p = |x)(x|. Consider the projection P := J = p. Then, 0 = PpP = 
APp,P + (1 — A)Pp2P. So, PpyP = Pp2oP = 0. Hence, 0 = Tr Pp; P = Tr Pp. So, 
1 = Trp; = Tr pp; = (x|p|x), which implies p; = p. Similarly p2 = p. 
Exercise 1.9 Choose the diagonalizations X = /2"|ui) (ui| and Y = ¥7, y/lyj)(vjl 
with x’ > 0, > 0. Then, Tr XY = Dixy Myjlui?. 

Now, we assume that Tr XY = 0. Then, |(v;|v;) | = 0 for non-zero x! and y’. This 
relation implies that XY = 0. The opposite direction is trivial. 


Exercise 1.10 Assume that M = {M,,} is a PVM. M, =M,J =M,,(M, +1 - 
M.,) =M,+M,U — M,). So, we have M,,U — M,,) = 0. Hence, 0 = TrM,,U — 
M.,) = Tr M,,M., because Tr M,,7 — M,, — M,,) = 0. Due to Exercise 1.9, we have 
MM.) = 0. 

Conversely, we assume that M,,M_, = 0 for different w, w’. Then, M,, = M,,J = 
M.,(M, + Sy M,, = M?, which implies that M = {M.,} isa PVM. 


Exercise 1.11 Since p is positive semidefinite, ./M.,p./M., is also positive semi- 
definite. Tr >), /M..p/M,, = >), Tr JM pVM = >, Tr VM VM.p = >, Tr 
M..p = Tr >.,,M.p = TrIp = Tr p = 1. When M is a PVM, we have /M,, = M., 
So, Tr >|, MupM., = Tr >, /MupJ/M, = 1. 


Exercise 1.12 The relation p, > 0 holds if and only if det p, > 0 and 1 > x?. Since 


det py = Ld + (x3)?) + x! — x2)! +271) = toll" the above conditions are 
equivalent with }77_,(x')? < 1. 


Exercise 1.13 If and only if p, is a pure state, the relation det p, = 0 holds. This 
condition is equivalent with the condition ||x|| = 1. 


Exercise 1.14 Any 2 x 2 Hermitian matrix A can be written as er a;S'. Since 
TrA =a’, the condition Tr A = 0 is equivalent with a° = 0. 
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Exercise 1.15 TrX4 @ Xz = > (uj, uP |X4 @ XpluA, u?) 


= Di (ui |Xalui}) (uP |Xp|u?) = TeX - Tr Xp. 
Exercise 1.16 We can show that (X ® Jp), ® Y) = (X @ Y). Similarly, we can 
show that (4 ® Y)(X @ Ip) = (X @ Y). So, we obtain the desired argument. 


Exercise 1.17 Since (X @ Ig)|I) = (X @ Ip)| 1", u? @ v8) = | >i, xu @ uP) = 
(I, @ X7)| 5, uP @ u) = (14 @ X7)|I), we have o = (YXZ ® Ip)|I) = (YX 
® Ip) (Z @ Ig) |L) = (Y @ Ip)(X ® Ip) (a @ Z")|I) = (Y @ Ig) ® Z")(X @ Ig) II) 
= (¥ O27 jz). 

Exercise 1.18 (Y[X) = Dyj.7 Dj 7x (uh, Plu, uP) = Dj yx!) = Tr Y*X. 


Exercise 1.19 Assume ©. Tr(M4.., ® a ® pp 
= (Tr M4.., pa) (Tr Mz..., Pp). The measurement outcomes wa and wg are independent 
of each other. So, we obtain ©. 

Assume @). We fix Mz = {|u)(u|, 7 — |u)(ul}, Le., Mgo = |u) (ul, Mg) = 7 — 
|u)(u|. We choose an arbitrary POVM My = {M,...,} on Hy. Assume that we 
apply independent measurement M, @® My. The marginal distribution of wy is 
~ Tr(M4.0, © Mp.u,)e = TrMa.., ® Ip)p = Tr My, (Tre p). Due to the condi- 
tion @, when the outcome of wg is 0, the conditional distribution is also Tr M4..., 
(Trg p). So, we have Tr M,..,,(u|p|u) = Tr My..., Trae ® |u) (ul)p = Tr(Ma.n., ® 
Ju) (ul) = (Tr Mg,.., (Tre p) (Tr ® |u) (ul)p) = (Tr My,., (Trg p)) (ul Tra plu) = 
(Tr Ma, (Trg p)) Tr(u|p|u). Since the above equation holds for any POVM My, on 
Ha, we have ©. 

Assume @. For any two vectors |u4) € Ha and |ug) € 7g, we have Tr |ua,, 
ug) (Ua, Uplp = (ual Tre pla) (us| Tra plug) = Tr |ua, up) (ua, Up|(Ttg p) @ (Tra p). 
Since any Hermitian matrices on H4 @ Hg can be written as linear combinations 
of |u4, Up) (Ua, UB|, P = (Tre p) © (Tr, p), which implies ©. 


‘ A Bo ij A Ik Bo Halk A B 
Exercise 1.20 We have v* ® ug = 9); U4; @ Di) Vg uy = Di; V4Uguy @ uy. 


Exercise 1.21 Since Tr Sisé = 26;,;, we have 
1 
(ep? |e) = 5 Tr nye = OKI 
Exercise |.22 To show (1.27), it is sufficient to show that 


Tr p(X ® Ip) = Tr >> put) (uf |X. 
ij 


When X = 97; ; X;,illit) (up |, we have 
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Tr p(X ® Ip) = > xj ilu! & ub) (us ® uflp 
i,j,k 


= Do aige'l = Ted) out) (uf |X. 
ij ij 


Since p'/ = a p%)-@), (1.27) implies (1.28). Since 


d d’ 
Tra Trg pX = Tr pX ® Ip = Trp >) P(X @ Ip) = SS Tr pPx(X @ Ip) 
k=1 k=1 
d' d' d' 
= DL Tr pPe(X @ Ip) Pe = D1 Tr PrpPu(X @ Ip) = Tra > (Tre PepPn)X, 
k=1 k=1 k=1 


we have (1.29). 
Exercise | .23 


(a) When [pa, 74] = [pz, 78] = 9, 


Pa ® ppo, ®@ Og = (Pa @ Ip) Ua ® pp) (G4 ® 14) Uy @ OB) 
=(04 @ I4) U4 © 0B) (pa @ Ip) Ua ® pz) = 04 @ OBpa ® pp. 


(b) When Tr pao, = 0, we have pao, = 0. Hence, 

Pa ® ppoa @ op = 0 = 04 @ OBpa ® pp. 
(bf c) Assume that the relations (1.32) hold. When Tr pac, = 0 or Tr pgog = 0, (b) 
implies (1.31). When neither Tr p40, = 0 or Tr pgog = 0 does not holds, [p4, 74] = 


[eB, 0p] = O. Then, (a) implies (1.31). 
(d) Assume that the relations (1.31) hold. Take the partial trace on A. 


(Tr paca) ppop = (Tr o4pa)osps, 


which implies the first condition of (1.32). Similarly, taking the partial trace on B, 
we obtain the second condition of (1.32). 


Exercise | .24 It is sufficient to show 
TrX(U4 @ Y)(Z @ Ip) = Try @ Y)X(Z @ Iz) 
for an matrix Z on 1,4. Since (4 ® Y) is commutative with (Z @ Ig), 


TrX(4 @ Y)(Z @ Ip) = TrX(Z @ Ip) (a @ Y) = Tr @ Y)X(Z @ I). 
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Trg Vp © polX, Y @ Iglv/p @ po 
= Trp(/p @ Ig), ® Spo) (X(Y @ Ip) — (Y @ Ig)X)(/p @ Ip) a ® 4/P0) 
= Trg(./p ® Ig)[Ua ® /P0)X LA ® /p0)(Y ® Ip) 

— (Y @Iz)Uh ® J/p0)X (Lh ® /p0) 1/0 ® Ip) 
= Jp(Tral(a @ Jo) X Ua @ /Po)I¥ — Y TralUa @ /p0)X Ua ® /P0)])/ 0 
=JplTre (14 ® v/p0) X (Is ® V/p0) » Y1VP- 


Exercise |.26 It is sufficient to show that 
Tr(\u)(u4| ® Ip)X (I, @ Z) = Tr PXP(I, @ Z) 
for a matrix Z on 1g. This can be shown as follows. 


Tr(\u4) (u“| ® Ip)X (U4 ® Z) = Tr PX (I, @ Z) 
=Tr PX (I, @ Z)P = Tr PXP(I; ® Z). 


Exercise |.27 


(a) Since X — Y > Oand Y—Z>0,wehaveX —-Z=X—-—Y+Y-Z>0. 
(b) Since X — Y > 0 and —(X — Y) > 0, X — Y = 0, which implies X = Y. 


Exercise 1.28 Since X and /Y commute, we have 


1 1 
Xo¥ =5 (XY +X) = 5 (V¥xv¥ + VYxvY) — J/¥xV¥ > 0. 
Take a common diagonal basis {|u;)}. Then, X and Y are written as X = 5°, x;|u;) (ui| 


and Y = >°; yilui) (uil. So, 


{(X>O}+{Y =O} = DP lus) ud + DS ly) 


i:x;>0 J:yj20 


> D>) lw) =X + ¥ = 0} 


i:xjty;>0 


Exercise 1.29 Since /Y is commutative with X, and X>, we have 


XL YX) = VYX}VY = VYX3VY = X2YXo. 


21 
. a _ 1 
Exercise 1.30 Since X o Y = t\10 


cannot be positive semidefinite if its determinant is negative, the matrix X o Y is not 
positive semidefinite. 


), we have det(X 0 Y) = i Since a matrix 
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10 


: ; 11 
Exercise 1.31 Since X, YX; = } (; ) and XYX> = t € 5 


), we have 


01 


. i 
Since det 7 ( 1 


) a -i the relation X; YX; > X2YX> does not hold. 


Exercise 1.32 The relation 


holds. Since 


2 yo_ (5-23-2)\_ (31 
e533) a) 


det(Y* — X?) = —1, which implies the matrix Y* — X? is not positive semi-definite. 


Exercise |.33 Let xo be the maximum eigenvalue of X among eigenvalues strictly 
smaller than x. Assume that X = >> , Xi|ui) (uj. Then, let Xo be the Hermitian matrix 
a x;\uj) (uj + Le Xo|uj)(u;. Then, we have Xj) > X > xP and rank{X — 
xl > 0} = rank{Xo — xI => 0}. The relation Xp > xP implies that P(Xo — x91)P => 
(x —x0)P. Hence, rank{Xo — xl > 0} > rank{Xo — xoJ > O} > rank P(Xo — xol) 
P=rankP. 


Exercise 1.34 Since {Y > O}X{Y > 0} = {Y = O}Y{Y = O}and{Y < O}X{Y < 0} = 
—{Y < O}Y{Y <0}, we have Tr{Y > O}X{Y > 0} => Tr{Y > O}Y{Y => O} and Tr 
{Y < O}X{Y < 0} > —Tr{Y < O}Y{Y < 0}. Hence, we have TrX = Tr{Y > 0} 
X{Y > 0} + Tr{Y < O}X{Y < 0} => Tr{Y = O}Y{Y = 0} — Tr{Y < O}Y{Y < 0} = 

Tr |Y|. 

Exercise 1.35 Assume that a and b are eigenvalues of X and (u;|X|u,) = ap + b0. — 
Pp) withO < p < 1. Then, we have (u2|X|u2) = bp + a(1 — p). Since f is strictly con- 
vex, fap + f(b)U — p) > flap + bU — p)) and f(b)p + f(a)d — p) > flop + 
a(1—p)). Thus, Trf(X) =f(@p+f(b)C — p) +f)p+f@CU — p) > flap + 
bl — p)) + f(bp + ad — p)) = f (ui |X|u1)) +f C(u2|X |u2)). 

Exercise 1.36 According to (A.10), we choose the bases {|u)} and {|u?)} of Ha and 
He such that 


I) = Do Alu, uP), 


where \; > 0. Then, pa = >, A? |) (u| and pg = >-; A? |v?) (u8|. Thus, 
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f(pa) Isl) = Do FOP) (| ® Tal) 


= DFO), uP) = Ls @ f (P81). 


Exercise 1.37 We have 


(|X @ FL) b) = D> AA; (4 Xue") (uP if Wu?) 
ij 
= DENA (ue Xu) (a? VIO") 
ij 
= DNA (MEX Leg) (uh FV PV) ee) 
ij 
= Do AA |X (uc) FV")? ) ua!) = Tr piXpif ((VYB")"). 


ij 
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Chapter 2 
Information Quantities and Parameter 
Estimation in Classical Systems 


Abstract For the study of quantum information theory, mathematical statistics, and 
information geometry, which are mainly examined in a nonquantum context. This 
chapter briefly summarizes the fundamentals of these topics from a unified viewpoint. 
Since these topics are usually treated individually, this chapter will be useful even 
for nonquantum applications. 


2.1 Information Quantities in Classical Systems 


When all the given density matrices p1, ..., 0, commute, they may be simultaneously 
diagonalized using a common orthonormal basis {u!,..., u¢} according to p; = 
>, prilu!)(u'|,---. Pn = >; Pnilu')(u'|. In this case, it is sufficient to treat only 
the diagonal elements, i.e., we discuss only the probability distributions p;,..., Dn. 
Henceforth we will refer to such cases as classical because they do not exhibit any 
quantum properties. Let us now examine various information quantities with respect 
to probability distributions. 


2.1.1 Entropy 


Before proceeding to the definition of information quantities, we prepare the notations 
for basic probability theory. For a given probability distribution p = {px}xeq of the 
real-valued random variable X, we define the expectation E,(X) as 


Ep(X) = >- apy. (2.1) 


xe 


When the number — log p, is regarded as a real-valued random variable, the 
Shannon entropy is defined as the expectation of the real-valued random variable 
under the probability distribution p, i.e.,! 


'In this case, we consider 0 log 0 to be 0 here. 
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H(p) = > = px log ps. (2.2) 


xEQ 


It is often simply called entropy. That is, when P({2) denotes the set of probability 
distributions on the probability space 92, H is a real-valued function on P({2). 
Sometimes, we denote the probability distribution of a random variable X by Py. 
In this case, we write the entropy of Py as H(X). For 2 = {0, 1}, the probability 
distribution is written as (a, 1 — a) and the entropy is called a binary entropy, which 
is given by h(a) = —aloga — (1 —a)log( —a). 

When the number of elements of S2 is a finite number k, it is possible to choose 
the distribution so that all probabilities p; have the same value. Such a probability 
distribution p = (p;) is called a uniform distribution and is denoted by pmix.q. It is 
simplified to pmix for simplicity. If it is necessary to denote the number of supports 
k explicitly, we write Pmix,z. AS shown later, any distribution p on S2 satisfies the 
relation 


H(p) < logk = H(pmix,a). (2.3) 


The entropy H(Px y(x, y)) of the joint distribution Py y for two random variables 
X and Y is denoted by H(X, Y). In particular, if Y can be expressed as f (X), where 
f is a function, then®**' 


H(X, Y) = H(X, f(X) = H(X). (2.4) 


Given a conditional probability Py,y-, = {Px |y(x|y)}x, the entropy of X is given 


by H(X|Y = y) = HA (Px y=y) when the random variable Y is known to be y. The 
expectation of this entropy with respect to the probability distribution of Y is called 
the conditional entropy denoted by H(X|Y). We may write it as 


H(XIY) S > Py (Px ly) log Pxyy rly) 


y ox 


Px,y (x, y) 
=— DS PxvG, y) log ra 
= y(y) 


=— >)Px.v(x, y) logPxy(x, y) + >” Py(y) log Py(y) 


=H(X,Y)—- A(X). (2.5) 


The final equation in (2.5) is called chain rule. Using chain rule (2.5) and (2.4), we 
have 


A(X) = H(f(X)) + H(X| f(X)) = ACF (X)), (2.6) 


which is called monotonicity. 
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Applying (2.4) to the distribution Py,y_,, we have 


H(X, f(X, Y)IY) = >) Pr H(X, F(X, yIY = y) 
y 


= > Pr(y)H(XIY = y) = A(XIY). (2.7) 
Since (as will be shown later) 
A(X)+ A(Y) — A(X, Y) = 0, (2.8) 
we have 
A(X) > H(X|Y). (2.9) 


Exe. 2.2. 


If Y takes values in {0, 1}, (2.9) is equivalent to the concavity of the entropy 


\H(p) + (1-2) H(p) < HApt+(—A)p), 0 <VA <1. (2.10) 


Exercises 
2.1 Verify (2.4) if the variable Y can be written f(X) for a function f. 
2.2 Verify that (2.9) and (2.10) are equivalent. 


2.3 Given a distribution p = {p,} on {1,..., k}. Assume that the maximum prob- 
ability p, is larger than a. Verify that H(p) < h(a) + (i — a) log(k — 1). 


2.4 Define pa xX pp(wa, we) = Pa(wa)pPB(wp) in 24 X Lz for probability dis- 
tributions p, in 24, pg in 2g. Show that 


H(pa) + H(pp) = H(pa x pa). (2.11) 


2.1.2 Relative Entropy 


We now consider a quantity that expresses the closeness between two probability dis- 
tributions p = {pi}icq andg = {q;}ieq. Itis called an information quantity because 
our access to information is closely related to the difference between the distributions 
reflecting the information of our interest. A typical example is the relative entropy” 
D(p|lq), which is defined as 


?The term relative entropy is commonly used in statistical physics. In information theory, it is gen- 
erally known as the Kullback-Leibler divergence, while in statistics it is known as the Kullback— 
Leibler information. 
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def i 
D(p\lq) = >) vi log (2.12) 


ieQ 


This quantity is always no less than 0, and it is equal to 0 if and only if p = q. 
This can be shown by applying the logarithmic inequality™ *° “log x < x — 1 for 
x > 0” to (2.12): 


0— D(pllq) = Yn (41+! -) < Yn 0- 0. 


Note that the equality of log x < x — 1 holds only when x = 1. We may obtain (2.3) 
by using the positivity of the relative entropy for the case g = {1/k}. 
Let us now consider possible information processes. For simplicity, we assume 


that the probability space (2 is given as the set N; = {1,..., &}. When an informa- 
tion process converts a set N; = {1,...,} to another set N, deterministically, we 
may denote the information processing by a function from N; to N;. If it converts 
probabilistically, it is denoted by a real-valued matrix {Qi} in which every element 
dQ; represents the probability of the output data 7 ¢ N; when the input data are 
i € Ng. This matrix Q = (Q’) satisfies > Q', = | for each i. Such a matrix Q 
is called a stochastic transition matrix. In this notation, Q' expresses the distrib- 
ution (Q},,..., Qj.) on the output system with the input i. When the input signal is 


generated according to the probability distribution p, the output signal is generated 


according to the probability distribution Q(p); = > 1 Qi pi. The stochastic tran- 


sition matrix Q represents not only such probabilistic jaformation processes but also 
probabilistic fluctuations in the data due to noise. Furthermore, since it expresses the 
probability distribution of the output system for each input signal, we can also use it 
to model a channel transmitting information. 

A fundamental property of a stochastic transition matrix Q is the inequality 


D(pllg) 2 D(Q(p)|l@@). (2.13) 


which is called an information-processing inequality. This property is often called 
monotonicity.> The inequality implies that the amount of information should not 
increase via any information processing. This inequality will be proved for the general 
case in Theorem 2.1. It may also be shown using a logarithmic inequality. 

For example, consider the stochastic transition matrix Q = (Q') from N>; to Nx, 
where Qi is 1 wheni = j, j + k and 0 otherwise. Given two probability distributions 
D, p’ in Nx, we define the probability distribution p for N>; as 


Pi =Api, Pitk = —A)p}, 1<Vi<k 


3In this book, monotonicity refers to only the monotonicity regarding the change in probability 
distributions or density matrices. 
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with areal number J € (0, 1). Similarly, we define g for two probability distributions 
q,q' in Nx. Then, 


D(p\lq) = AD(pllg) + A — AD (p'Iq‘). 


Since QO(p)=Ap+(1—A)p’ and OG) =Aq+(1—A)q’, the information- 
processing inequality (2.13) yields the joint convexity of the relative entropy 


AD(pllq) + (1 — AD(p'llq’‘) = DApt+ A Ayp'IAg + A —A)q’). (2.14) 


Next, let us consider other information quantities that express the difference 
between the two probability distributions p and q. In order to express the amount 
of information, these quantities should satisfy the property given by (2.13). This 
property can be satisfied by constructing the information quantity in the following 
manner. First, we define convex functions. When a function f satisfies 


f(Ax1 + (1 — A)xa) SAFO) + d-A)f@2), OSVA< 1, Ve, €R, 


it is called a convex function. For a probability distribution p = {p;}, a convex 
function f satisfies Jensen’s inequality: 


> Pf Ga) = 03 ps) (2.15) 


Theorem 2.1 (Csiszar [1]) Let f be a convex function. The information quantity 
Dy(pllq) = a af (#) then satisfies the monotonicity condition 


D;(pliq) = Dp(Q(p)|Q@)). (2.16) 


Henceforth, D ¢(p|\q) will be called an f-relative entropy.* 


For example, for f(x) = x logx we obtain the relative entropy. For f(x) = 1 — 


Vx 
1 
Dy (pq) =1- >. VPI VG = 5 > (VR - Vai) . (2.17) 


Its square root is called the Hellinger distance and is denoted by d2(p, q). This satis- 
fies the axioms of a distance®™*?"". When f (x) = —b,(1 — x@+/?)(-1 <a <1), 


1—a2 


1+a)/2 (I-a)/2 : 
ise (1 = P; nea q a ) according 


Df (pllq) is equal to the a-divergence 


4This quantity is more commonly used in information theory, where it is called f-divergence [1]. 
In this text, we prefer to use the term “relative entropy” for all relative-entropy-like quantities. 
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to Amari and Nagaoka [2]. By applying inequality (2.16) to the concave func- 


tion x > x* (0 < s < 1) and the convex function x > x* (s < 0), we obtain the 
inequalities 


DP; a! < D/O); O@); ford <5 <1, 
i J 


dpi a = >) Op) * O@)§ for s <0. 
i j 


l-s_s 


Hence, the quantity ¢(s|p||lq) = log(>"; Pp; °g;) satisfies the monotonicity 


O(slPlla) < os|Q(P)I|Q@)) forO <5 <1, 
O(s|Pllq) = o(s|Q(p)I|Q@)) for s < 0. 


The relative entropy can be expressed as 


? Olpiq) =—D(plia). ¢C\plla) = Diallp). (2.18) 


Since $(s|p||qg) is a convex function of s***'°, the relative Rényi entropy [3] 


e — 00 1 -s is 
Dy_.(plla) folly _ _ PsIP Ig) POlplla) _ log pl"4 


(2.19) 


is monotone decreasing for s***'’. More precise analyses for these quantities are 
given in Exercises 3.45, 3.52, and 3.53. 

We will abbreviate it to #(s) if it is not necessary to specify p and q explicitly. 
Hence, we define the minimum and the maximum relative entropies as 


def Pi def 
Dwax(Dllg) = —logmax Duin pllg) = —log SS a (2.20) 
: i:pi>0 


Hence, we obtain the relations®™**'*? 


jim Di-s(pllq) = Dmax(Plig), lim Di-s(plla) = Dmin(Pllg), (2.21) 
lim Di_s(pllq) = D(plla). (2.22) 


That is, Dmnax(p||g) and Dmin(p||g) give the maximum and the minimum values of 
D_;(p\lq), respectively. 


Proof of Theorem 2.1 Since f is a convex function, Jensen’s inequality ensures that 
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O\4di (“) Qin pi\_ -( Xi Qiri 
d >) Q'-gir I di = AX > Ogi gq) f Di Qi gi , 


Therefore, 


comer, Bethe 
Pr QMO) = YY OF arf (=; Oar) 
-f 


Oi 


SLL e Ls oa (GZ) 
=D Dear(#)- Las (2) - Dy(pllq). 


We consider the variational distance as another information quantity. It is defined 
as 


di(p.q) # = 5 Pi ail: (2.23) 


It is the f-relative entropy when f (x) is chosen to be 5|1 — x|. However, it satisfies 
the monotonicity property™ *° 


d\(Q(p), Q(q)) < di(p.q). (2.24) 


The variational distance, Hellinger distance, and relative entropy are related by the 
following formulas: 


1 
di(p,q) = & (p,q) = 5p, q). (2.25) 


D(pllq) = -2¥( ru) > 2d3(p, 4). (2.26) 


The last inequality may be deduced from the logarithmic inequality. The combination 
of (2.25) and (2.26) is called Pinsker inequality. 

When a stochastic transition matrix 0 = (Q') satisfies >”, Qi = I, iLe., its trans- 
pose is also a stochastic transition matrix, the stochastic transition matrix Q = (Q'\) 
is called a double stochastic transition matrix. Now, we assume that the input 
symbol i and the output symbol j take the values in 1,...,k, and 1,..., ko, 
respectively. When the stochastic transition matrix Q = (Q') is double stochastic, 
we have kz = pipe l= paar 2 Qi = par bie Qi = pan 1=k,. That 


is, any double stochastic matrix is a square matrix. 
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A stochastic transition square matrix Q is a double stochastic transition matrix 
if and only if the output distribution Q(pmix) is a uniform distribution because 
a( Pix) ;= > 0; — is The double stochastic transition matrix Q and the prob- 
ability distribution p satisfy 


logk — H(Q(p)) = D(Q(p)|| Pmix,x) = D( Pll Pmix,x) = logk — H(p), 


which implies that 


H(Q(p)) = H(p). (2.27) 


Exercises 


2.5 Show that 


D(pallga) + D(psilgs) = D(pa X Pallga X qa) (2.28) 
for probability distributions p4, g4 in 2,4 and pg, gz in S28. 


2.6 Show the logarithmic inequality, i.e., the inequality log x < x — 1, holds for 
x > Oand the equality holds only for x = 1. 


2.7 Show that the f-relative entropy D;(p||q) of a convex function f satisfies 
Dy (pliq) = fC). 


2.8 Prove (2.17). 
2.9 Show that the variational distance satisfies the monotonicity condition (2.24). 


2.10 Show that d\(p,q) > d3(p,q) by first proving the inequality |x — y| > 
(/x — fy)’. 


2.11 Show that d}(p, q) = +d?(p, q) following the steps below. 
(a) Prove 


(Xie -al) < (ZvF- vat (Zr vat) 


using the Schwarz inequality. 
(b) Show that 7; |. /pi + /@il’ < 4. 
(c) Show that d5(p, q)= sdy (p, g) using the above results. 


2.12 Show that d, (p,q) < be [Px — qx| for any xo. 


2.13 Show that D(pllq) > —2log (>; /Pi VG): 
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2.14 Verify that the Hellinger distance satisfies the axioms of a distance by following 
the steps below. 
(a) Prove the following for arbitrary vectors x and y 

(lel + ly)? = Well? + ey) + (yx) + Ill? 
(b) Prove the following for arbitrary vectors x and y: 


llxll + llyll 2 le + yl. 


(c) Show the following for the three probability distributions p, g, andr: 


[evar s |Z aval + [NR vay 


Note that this formula is equivalent to the axiom of a distance do(p, q) < do(p,r) + 
d2(r, q) for the Hellinger distance. 


2.15 Show (2.18). 
2.16 Show that o(s|p||g) is convex for s. 


2.17 Show that £& is (strictly) monotone increasing for s when f (0) = 0 and f(s) 


Ss 
is (strictly) convex for s. 


2.18 Show that lim,-.—.~. D\~s(p|l¢) = Dmax(p||q¢) by following the steps below. 
(a) Show that 1 log(> a;b}) — log max(b,,..., by) ast > oo for a;, bj = 0. 
(b) Show the desired equation. 


2.19 Show that lim,_,, D\-s(pllq) = Dyin (Pq). 


2.20 Show that 


k k 
D = max iA; — lo ie 2.29 
(pig) = _ max d P s2i4 ¢ (2.29) 
for two probability distributions p and q on {1,..., k}. 


2.1.3 Mutual Information 


Given the joint probability distribution Px y of two random variables X and Y, the 
marginal distributions Py and Py of Px.y are defined as 
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def def 
Px(x) = S°Pxy(x,y) and Py(y) = Do Pxy(,y). 
y x 


Then, the conditional distribution is calculated as 


Px y(x, y) 


Pxy(xly) = hey . 


When Px (x) = Px y(x|y), two random variables X and Y are independent. In this 
case, the joint distribution Py y (x, y) is equal to the product of marginal distributions 
Py x Py(x, y) := Px(x)Py(y). That is, the relative entropy D(Px.y||Px x Py) is 
equal to zero. We now introduce mutual information / (X : Y), which expresses how 
different the joint distribution Px y(x, y) is from the product of marginal distributions 
Py(x)Py(y). This quantity satisfies the following relation: 


def Py y(x, y) 
(X.Y) = DPx,y||PxPy) = D_ Pxyv(, y) log -— 
(XY) & Dx y|IPxPy) pa THIS DNR GPa) 


=H(X)—HA(X|Y)=A(Y)—-A(Y|X)=A(X)+A(Y)-A(X,Y). (2.30) 
Hence, inequality (2.8) may be obtained from the above formula and the positivity 
of [(X : Y). Further, we can define a conditional mutual information in a manner 


similar to that of the entropy. This quantity involves another random variable Z (in 
addition to X and Y) and is defined as 


1X: ¥|Z) 2 > Pz@IK: V|IZ=a (2.31) 


Pxyjz(, ylz) 
Pyjz(x|z)Pyiz(Qylz) 


a >> Px y,z(%, y, z) log 


XYZ 


where /(X : Y|Z = z) is the mutual information of X and Y assuming that Z = z 
is known. By applying (2.5) and (2.30) to the case Z = z, we obtain 


1(X :¥|Z) = H(X|Z) + H(Y|Z) — H(XY|Z) = H(X|Z) — H(X|YZ) 
= — (A(X) — H(X|Z)) + (A(X) — H(X|YZ)) 
=—1(X:Z)+1(X: YZ). 


This equation is called the chain rule of mutual information, which may also be 
written as 


1(X :YZ)=1(X:Z)41(X:¥|Z). (2.32) 


Hence, it follows that 
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I(X:YZ)>I(X:Z). 
Note that (2.32) can be generalized as 
TX: YZU) = 1X: Z|\U) + 1(X: Y|ZU). (2.33) 


Next, we apply the above argument to the case where the information channel is 
given by a stochastic transition matrix Q = (Q‘)) and the input distribution is given 
by p. Let X and Y be, respectively, the random variables of the input system and 
output system. That is, their joint distribution is given as Py y(x, y) = Q) px. Then, 
the mutual information J (X : Y) can be regarded as the amount of information trans- 
mitted via channel Q when the input signal is generated with the distribution p. This 
is called transmission information, and it is denoted by /(p, Q). Therefore, we can 
define the transmission information by 


def 


I(p, Q) = H(Q(p)) — > pr H(Q*). (2.34) 
We will now discuss Fano’s inequality, which is given by the following theorem. 


Theorem 2.2 (Fano [4]) Let X and Y be random variables that take values in the 
same data set Nx = {1, ..., k}. Then, the following inequality holds: 


A(X|Y) < P{X AY} log(k — 1) + h(P{X # Y}) (2.35) 
<P{X #Y}logk + log2. 


= | Nesaees . Applying (2.5) to X and Z 


Proof We define the random variable Z = IX£Y 
under the condition Y = y, we obtain 


H(X|Y = y) = A(X, Z|Y = y) 
= >oParGly)A(X|Z =z, ¥ = y) + H(ZIY =y). 


The first equality follows from the fact that the random variable Z can be uniquely 
obtained from X. Taking the expectation with respect to y, we get 


A(X|Y) =H(X|Z, Y)+ H(Z|Y) < H(X|Z,Y)+ H(Z) 
= H(X|Z,Y) + h(P{X # Y}). (2.36) 


Applying (2.3), we have 


H(X|¥ =y,Z=0)=0, H(X|¥ =y,Z=1) < log(k—1). 
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Therefore, 

H(X|Y, Z) < P{X 4 Y} log(k — 1). (2.37) 
Finally, combining (2.36) and (2.37), we obtain (2.35). a 
Exercise 


2.21 Show the chain rule of conditional mutual information (2.33) based on (2.32). 


2.1.4 The Independent and Identical Condition and Rényi 
Entropy 


Given a probability distribution p = {p; ae we define the Rényi entropy H_;(p) 
of order 1 — s as 


A\_;(p) = 


det (S| p) def fe 
— wisip) = los DP; (2.38) 


for a real number s in addition to the entropy H(p). We will abbreviate the quantity 
a)(s|p) to ~(s) when there is no risk of ambiguity. When 0 < s < 1, the quantity 
p(s) is a positive quantity that is larger when the probability distribution is closer 
to the uniform distribution. When s < 0, the quantity 7(s) is a negative quantity 
that is smaller when the probability distribution is closer to the uniform distribution. 
Finally, when s = 0, the quantity 7(s) is equal to 0. The derivative ~)' (0) of 4)(s) at 
Ss = Ois equal to H(p). 

Hence, Rényi entropy H,_,(p) is always positive, and the limit lim,_,9 H\_;(p) 
equals H(p). Further, since 7(s) is convex, Rényi entropy H)_;(p) is monotone 
increasing for s. In particular, Rényi entropy H\_;(Pmix,«) is equal to log k. Hence, 
Rényi entropy H\_;(p) expresses the amount of the uncertainty of the distribution 
of p. We also define the minimum entropy Hpin(p) and the maximum entropy 
FAypax(P ) as 


def def ‘ 
Amin (P) = log ee Pi» Amax(P) = log {i Di > O}|. (2.39) 


Then, we obtain 
jim As (p) = Anin(P), lim As (p) = max (P)- (2.40) 


These give the minimum and the maximum of Rényi entropies H,_;(p). 
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Now consider data i,,...,i, that are generated independently with the same 
probability distribution p = {p; ee The probability of obtaining a particular data 
sequence i” = (i1,...,i,) iS given by p;, ---- + p;,. This probability distribution is 


called an n-fold independent and identical distribution (abbreviated as n-i.i.d.) and 
denoted by p”. Then, we have 7(s|p”) = nw(s|p),ie., Hi_-s(p") =n M_s(p)*’”. 
When a sufficiently large number n of data are generated according to the independent 
and identical condition, the behavior of the distribution may be characterized by the 
entropy and the Rényi entropy. 

The probability of the likelihood being less than a > O under the probability 
distribution p, i.e., the probability that {p; < a}, is 


s k 
a 
rp <a= >) <>) (<) PiS >) Py a sete (2.41) 


i: pi<a lst : i=1 
ifO0 < s < 1. Accordingly, 
Pp" {py < P mas < e” mino<s<1((s)—sR) (2.42) 


Conversely, the probability of the likelihood being greater than a, i.e., the proba- 
bility that {p; > a}, is 


Ss k 
a MS Sloga 
P{pi >a} < S. (=) pi < Se te PAAC )+s log (2.43) 


i:il>t i=l 
Pi 


if s < 0. Similarly, we obtain 
p'{pr > ge) < e” mins<o(W(s)—sR) (2.44) 


The exponential decreasing rate (exponent) on the right-hand side (RHS) of (2.42) 
is negative when R > H(p). Hence, the probability p"{p%, < e~"*} approaches 0 
exponentially. This fact can be shown as follows. Choosing a small s; > 0, we have 
H\_;,(p) — R < 0. Hence, we have 


amin (6) — 5R) = min s(H\-s(p) — R) <s1(Hi-s(p)— R) <0. (2.45) 


<s 


Hence, we see that the exponent on the RHS of (2.42) is negative. Conversely, the 
exponent on the RHS of (2.44) is negative when R < H(p), and the probability 
p"{p, < e~"®} approaches 0 exponentially. This can be verified from (2.45) by 
choosing sz < 0 with a sufficiently small absolute value. 

We may generalize this argument for the likelihood q7, of a different probability 


distribution g as follows. Defining ws) = log >"; pig; *, we can show that 
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pig See geo (2.46) 


Priqn > e PR} < erminso@)—sR) (2.47) 


The Rényi entropy H;_;(p) and the entropy H(p) express the concentration of 
probability under independent and identical distributions with a sufficiently large 
number of data. To investigate the concentration, let us consider the probability 
P(p, L) of the most frequent L outcomes for a given probability distribution p = 
( pi)» This can be written as 


L 
Po.D=)> F. (2.48) 


i=1 


where py are the elements of p; that are reordered according to size. Let us analyze 
this by reexamining the set {p; > a}. The number of elements of the set |{p; > a}| 
is evaluated as 


Di Sqn s s)—(1—s) loga 
tr > ais (2) ye Its _ @¥(s)-(-s) log (2.49) 


i:pi >a i=] 


when 0 <s < 1. By using (2.41) and defining b(s, R) = eB for R and 0 < 
s < 1, we have 


x(s)—sR 
ay Dee ees 


ee Ser Se*, 


We choose so © argming-,<1 His) — sR6 and define P‘(p,e ay = = P(p,e*); 
hence, 


w(sq)—soR . u(s)—sR 


P°(p,e®) <e@ to = eDiosss1 Ts, (2.50) 


Applying this argument to the n-i.i.d p”, we have 


W(sq)—s9R ; W(s)—sR 


P&(p, 8%) < et a = etminossss “PSE (2.51) 


Now, we let R > H(p) and choose a sufficiently small number 0 < s; < 1. Then, 
inequality (2.45) yields 


v(s) — sR _ SCHi_5(p)— R) _ 8 (Ais, (p) — R) 
n —_— = min < 
O<s<1 1l-s O<s<l l-s l-s, 


<0. 


5If L is not an integer, we consider the largest integer that does not exceed L. 
Sargming<,<| J (s) returns the value of s that yields mino<;<; f(s). argmax is similarly defined. 
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Hence, the probability P°(p”, e"®) approaches 0 exponentially. That implies that the 
probabilities are almost concentrated on the most frequent e”* elements because 1 — 
P¢(p", e”®) equals the probability on the most frequent e”* elements. Since this holds 
when R > H(p), most of the probabilities are concentrated on e”““”) elements. 
Therefore, this can be interpreted as meaning that the entropy H(p) asymptotically 
expresses the degree of concentration. This will play an important role in problems 
such as source coding, which will be discussed later. 

On the other hand, when H(p) > R, P(p", e"*) approaches 0. To prove this, let 
us consider the following inequality for an arbitrary subset A: 


pA <alA|+ p{pi > a}. (2.52) 


We can prove this gat ae) by considering the set A= (AN {p; < a}) U(AN 
{pi > 4}). Defining R © log|A| anda © e%-®) and using (2.43), we obtain pA < 


w(s)— 


Qe * Therefore, 


ve)- sR 


P(p,e ee ; (2.53) 


and we obtain 


n onR nmins<o Vis)—sR 
P(p", €*®) < 2et misses 


(2.54) 


We also note that in order to avoid P(p”, e”*) + 0, we require R > H(p) according 
to the condition min,<9 wis sh <0. 


Exercises 
2.22 Show that W(s|pa x pa) = W(s|pa) + WOS|pz). 


2.23 Define the distribution p,(x) := p(x)!~*e~” and assume that a distribution 
q satisfies H(q) = H(p,). Show that D(p,||p) < D(q||p) for s < 1 by following 
steps below. 

(a) Show that ~~ TP Gallps) = = +d, a(x) log g(x) — ©, g(@) log p(x) + a 


(b) Show Dali) - tes Pa llps) = Divsllp)- 
(c) Show the desired anaes 


2.24 Show the equation 


secant in D(q\lp) (2.55) 
sup. ——_ = min - 
Pee l—s guigee 


following the steps below. 

(a) Show that 8" < 0 for R < H(p) ands € [0, 1]. 

(b) Show that both side of (2.55) are zero when R < H(p). 
(c) Show that 
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H(ps) = (L—s)"(s) + W(s), 
D(ps|lp) = sv'(s) — W(s). 


(d) Show that 
d / ” 
a — s)y'(s) + p(s) = A — s)"(s) < 0, 
d 
7550 ©) — us) =s6"(s) > 0 
S 


for s € (0, 1). 


(2.56) 
(2.57) 


(2.58) 


(2.59) 


(e) In the following, we consider the case R > H(p). Show that there uniquely exists 


Sr € (0, 1) such that H(p,;,) = R. 


(f) Show that 
min D = D(ps ? 
Pee (qilp) (Psp llP) 
(g) Show that 
min D = D(ps . 
ee (qilp) (DsellP) 
(h) Show that 
SRR — W(Sr) 
D(PsyllP) = —=———. 
— Se 


(i) Show that 


d sR—wWs)  R+(s—1Iwv'(s) — WO) 


ds l-s (1 —s) 
(j) Show that 
sR—w(s) = SRR — W(sr) 

o<s<1 I-s l—sr 
(k) Show (2.55). 
2.25 Show that 

sR — Ws 
oe 


(a) Show that there uniquely exists sr < 0 such that H(p,,) = R. 
(b) Show that 


(2.60) 


(2.61) 


(2.62) 


(2.63) 


(2.64) 


(2.65) 


2.1 Information Quantities in Classical Systems 41 


min D(q\|p) = D(PsxllP)- ee 
q:H(q)=r 

(c) Show that 
Nek = D(PsellP)- 2.67 
RP. PLD) = PIP) _ 

(d) Show that 

spR— (s ) 
D(PsllP) = aS oa 
5 


(e) Show that 


sR—W(s) _ srR — W(sp) 
up = . 


up —s (2.69) 
(f) Show (2.65). 
2.26 Assume that R < Hypin(p). Show that 
sup — min DIP) = Hnin(P) (2.70) 
s<o l—s q:H(q)=0 
2.27 Show that 
— log max p; < Ha(p) = — log min p; (2.71) 


for a > 0. 


2.1.5 Conditional Rényi Entropy 


Next, we consider the conditional extension of Rényi entropy. For this purpose, we 
focus on the following relation between the conditional entropy and the relative 
entropy. For a given joint distribution Pyy on Y x Y, we have two characterization 
for the conditional entropy™**”* 


A(X|Y) = log || — D(Pxy|l Pmix,~ X Py) (2.72) 
H(X|Y) = log |&| — _ D(Pxy || Pmix,.v X Qy). (2.73) 
Y 


Based on the above relations, we define two kinds of conditional Rényi entropies 
for s € (—1, o0)\{0} as follows. 
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def 
My45(X|Y) = i Di +5 (Pxy || Pmix,x x Py) 


= 5 Tos Pr) DPrivayo'™ (2.74) 
def - 
Hi (X|Y) = log |X| — eg Di +5(Pxy || Pmix.x X Qy), 
y 


1 
= max —+ tog Px yx, )"¥ Or) 2.75 
max ; og 2 xy (x,y) Oy(y) (2.75) 


where Qy is an arbitrary distribution on Y. In the case of s = 0, they are defined as 
H(X|Y) because™*” 


lim Hi4s(X|¥) = lim a Ell 
Ss 


(X|Y) = H(X|Y). (2.76) 

ee is to the relations (2.40), conditional minimum entropies = (X|Y) 
and Hi (x |Y) and conditional maximum entropies Hyax(X|Y) and AH], (X|Y) 
are defined as 


Fs 


Hin(X1¥) = lim Aiys(X1¥), Afin(X1Y) = tim Hy! 


min l+s 


(X|Y), (2.77) 


Hmax(X|Y) & Jim, Hiss(X1Y), Hy (x|y) = = lim As (X1Y). (2.78) 


ioe 
From the definition, we find the relation 


H,,,.(X|Y) < Ht, (XY). (2.79) 


I+s 


Unfortunately, these two conditional Rényi entropies are not the same in gen- 
eral. Thanks to the property of the relative Rényi entropy, we have the following 
lemma® 2.31 . 


Lemma 2.1 The functions s +> sH\4s;(X|¥Y) and SH s(X1¥) are concave for 


s € (—1, 00). The functions 5s +> Hi45(X|¥Y) and HY, ,(X|Y¥) are monotonically 
decreasing. 


Lemma 2.2 The quantity Hi. (X|Y) has the following form. 


Hi, (X1¥) = log|4| — Diss (Pxvll minx x PY?) (2.80) 


1 


I+s 


le DPT Pave") (2.81) 


2 
I+s 


1 
= “ie (LPoeo™) (2.82) 
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1 
Lbs) THe 
where PUT (y) ra — Dig Pay yy — 
Dy CH, Pry ey) Hs 


Proof Substituting >°, Pxy (x, y)'** and Oy(y)~ to f and g in the reverse Holder 
inequality (A.27) with p = and q = — ;, we obtain 


l+s 


e7 Slog | |—Di4s (Px ll Pix, x Qy)) 


=>) > Px, y)'* Ovo) 
y x 


l+s 


1/(1+s) 2k 
X(z Pxy(x, ») (x oro") 
* y 


y 


IV 


I 
M 
—~ 

*M 
so 
x 
SS 
ir 
SS 
5| 


for s € (0, oo]. Since the equality holds when Qy(y) = PG); we obtain 


I+s 


1 
T+s 
n 
eS Hi4s(XIY) = x(x Pyy(x, n) , 
x 


y 


which implies (2.81) with s € (0, co]. 
The same substitution to the Hélder inequality (A.25) yields 


l+s 


1 
T+s 
e Slog | ’|— Diss (Pxy Il Pmix. x Qy)) 2 > (= Pyy(x, ») 
* 


y 


for s € (—1, 0). Since the equality holds when Qy(y) = P\’*"’(y), we obtain (2.81) 
with s € (—1,0). 
Finally, (2.82) follows from a simple calculation. a 


Taking the limits s — —1 and s > oo in Lemma 2.2, we obtain the following 


lemmaé° 2.30 


Lemma 2.3 The quantities Hmin(X|Y), Ht. 


min(XIY), Hmax(X|¥), and Ht, (X|Y) 
are characterized as 


Anin(X|Y) = — log Bs oP xiv=v@), (2.83) 
x,y:Py(y)> 


Hyyin X1Y) = —log > Py(y) max Pxiy=y(x), (2.84) 
y 
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Hee) = log 2 By OOtalPaitey@) > O}, (2.85) 


(X|Y) = — log Poe I{x|Pxjy=y(x) > O}]. (2.86) 


ce 


Further, as an inequality opposite to (2.79), we have 


Lemma 2.4 ((5, Lemma 5]) Fors € (—1, 1)\{0}, we have 
Aiss(X|Y) = H*, (XY). (2.87) 
l-s 


Proof Next, we consider the case with s € (0, 1). Substituting Pxyy(x, y) and 
(Ape ys to f and g in the Holder inequality (A.25) with p = ~i and q =}, 
we obtain 


cn LS rte» (CHEDY 

< 1/(1-s) ai Pxy(x’,y) | 

<> )( > Pxv@, y) Pa ecw (2.88) 
y x 


x! 


l-s 


—sH', (X|Y 
-¥(Sroeyier) neem 
y x 


for s € (0, 1) because >". 2 = a = 


Next, we consider the case ith Ss = el 1 0). ‘The Bate substitution to the reverse 
Holder inequality (A.27) with p = 1/(1 — s) andg = 1 yields 


_ t 
os Fis (XW) ei (Xx|Y) 
Pyy (x,y) \ P 
because (>>. 7) = Ca =1. | 


Now, we consider the meaning of two kinds of conditional Rényi entropies. For this 
purpose, we discuss the case when Pxny= is the independent and identical distribution 
of Pyy. Applying (2.42) and (2.44) to the distribution P yn) yn_, and taking the average 
with respect to y under the distribution Pyn, we have 


Pynya{(x, y)[Pxeiy=(xly) < e 78} < et min tses0 (RH AUD) (2.89) 
Pynyn{(x, y)|Pxnyyn(x|y) > e 78} < ef mitez0 S(R- Hiss IPD) (2.90) 
which gives an operational meaning of the conditional Rényi entropy Hj+;5(X|Y). 
Similarly, applying (2.50) and (2.53) to the distribution Pyx;yn_, and taking the 
average with respect to y under the distribution Py», we have 
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SD Pre (y)P(Pxnjynay, 7%) < et min-iseso ry (RH XID) (2.91) 
: 


: s tt 
> Pye(y) Po (Pxejvray, e7%) < eminent RAs (AND), (2.92) 


y 


which gives an operational meaning of the conditional Rényi entropy Hi, (X|Y). 


These inequalities clarify the difference between two kinds of conditional Rényi 
entropies. 


Exercises 


2.28 Show (2.72) and (2.73). 
2.29 Show (2.76). 

2.30 Show Lemma 2.3. 

2.31 Show Lemma 2.1. 


2.32 Show that the equality in (2.87) holds for a real s € (—1, 1)\{O} if and only if 
Pxy@,y) = aPr(y). 


2.2 Geometry of Probability Distribution Family 


2.2.1 Inner Product for Random Variables and Fisher 
Information 


In Sect. 2.1, we introduced the mutual information /(X : Y) as a quantity that 
expresses the correlation between two random variables X and Y. However, for 
calculating this quantity, one must calculate the logarithm of each probability, which 
is arather tedious calculation amount. We now introduce the covariance Cov, (X, Y) 
as a quantity that expresses the correlation between two real-valued random variables 
X and Y. Generally, calculations involving the covariance are less tedious than those 
of mutual information. Given a probability distribution p in a probability space 2, 
the covariance is defined as 


Cov, (X, ¥) = >° (XW) — Ep(X)) (YW) — E,(Y)) pw). (2.93) 


we 


If X and Y are independent, the covariance Cov,(X, Y) is equal to 0****’. Thus far 
it has not been necessary to specify the probability distribution, and therefore we 
had no difficulties in using notations such as H(X) and I(X : Y). However, since it 
is important to emphasize the probability distribution treated in our discussion, we 
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will use the above notation without their abbreviation. If X and Y are the same, the 
covariance Cov,(X, Y) coincides with the variance V ,(X) of X: 


def 
Covp(X, ¥) = Si (X Ww) — Ep(X))’ pw). (2.94) 
we 
Given real-valued random variables X,..., Xq, the matrix Cov, (X;, X;) is calleda 


covariance matrix. Now, starting from a given probability distribution p, we define 
the inner product in the space of real-valued random variables as’ 


(A, BY SS" Aw) BW) pW). (2.95) 


Then, the covariance Cov, (X, Y) is equal to the above inner product between the 
two real-valued random variables (X (w) — E,(X)) and (Y (w) — E,(Y)) with a zero 
expectation. That is, the inner product (2.95) implies the correlation between the two 
real-valued random variables with zero expectation in classical systems. This inner 
product is also deeply related to statistical inference in another sense, as discussed 
below. 

When we observe 1 independent real-valued random variables X,,..., X, iden- 
tical to real-valued random variable X, the average value 


yn oof Xi toe + Xn 


(2.96) 
n 
converges to the expectation E, (X) in probability. That is, 
p'{|X" —E,(X)| >e} > 0, Ve>O0, (2.97) 


which is called the law of large numbers. Further, the distribution of the real-valued 
random variable 


Vn(X" — E,(X)) (2.98) 
goes to the Gaussian distribution with the variance V = V(X): 


1 2 
Pov(x) = Juve : (2.99) 


1.€., 


b 
p'{a < J/n(X" —E,(X)) < b} > / Po,v(x)dx , (2.100) 


7The superscript (e) means “exponential.” This is because A corresponds to the exponential repre- 
sentation, as discussed later. 
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which is called the central limit theorem. Hence, the asymptotic behavior is almost 
characterized by the expectation E(X) and the variance V(X). 

For / real-valued random variables X,,..., X;, we can similarly define the real- 
valued random variables X17, ..., X/. These converge to their expectation in proba- 
bility. The distribution of the real-valued random variables 


(Vn(X] — E,(X)),..., Vn(XE — Ep(X))) (2.101) 


converges the k-multirate Gaussian distribution and the covariance matrix V = 
Cov (Xx, Xj): 


ae _ 
Pe (2.102) 


Jr)! det V 


Therefore, the asymptotic behavior is almost described by the expectation and the 
covariance matrix. 

Consider the set of probability distributions ppg parameterized by a single real 
number 0. For example, we can parameterize a binomial distribution with the prob- 
ability space {0, 1} by p9(O) = 8, pe(1) = 1 — 6. When the set of probability distri- 
butions is parameterized by a single parameter, it is called a probability distribu- 
tion family and is represented by {p9|9 € © C R}. Based on a probability distribu- 


tion family, we can define the logarithmic derivative as [p,(w) = Hee pole) \o=0) = 
pole) / Po, (w). Since it is a real-valued function of the probability space, it 


0=00 
can be regarded as a real-valued random variable. We can consider that this quan- 


tity expresses the sensitivity of the probability distribution to the variations in the 
parameter @ around 6. The Fisher metric (Fisher information) is defined as the 
variance of the logarithmic derivative /g,. Since the expectation of /», with respect to 
Po, is 0, the Fisher information can also be defined as 


def e 
Jo = (lo, la). (2.103) 


Therefore, this quantity represents the amount of variation in the probability distribu- 
tion due to the variations in the parameter. Alternatively, it can indicate how much the 
probability distribution family represents the information related to the parameter. 
As discussed later, these ideas will be further refined from the viewpoint of statistical 
inference. The Fisher information Jp may also be expressed as the limits of relative 
entropy and Hellinger distance™*******: 


Jo _ B(Po, Pore) 
F = 4lim ee (2.104) 
D é . D( pote 
=e elles ) =e cage Po) (2.105) 
«e>0 € «>0 € 
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The Fisher information Jg is also characterized by the limit of relative Rényi 
entropy 2.37 : 


Jo _ — (S| poll Pose) 
= lim 
2 0 &s(1—s) 


(2.106) 


Next, let us consider the probability distribution family {p9|0 € © C R“} with 
multiple parameters. For each parameter, we define the logarithmic derivative /y.;(w) 
as 


det Olog pow) _ Apew) 
lox(w) = aoe Pow). 


We use the covariance matrix (9-4, lg. nye, ) for the logarithmic derivatives 19.1, ..., lo-a 
instead of the Fisher information. This matrix is called the Fisher information 
matrix and will be denoted by Jg = (Jo.x, ;). This matrix takes the role of the Fisher 
information when there are multiple parameters; we discuss this in greater detail 
below. 

This inner product is closely related to the conditional expectation as follows. 
Suppose that we observe only the subsystem S2), although the total system is given 
as $2, X S22. Let us consider the real-valued random variable X of the total system. 
We denote the random variable describing the outcome in the probability space Q; 
by Z; for 7 = 1,2. Then, dependently of the distribution p of the total system, the 
conditional expectation «,(X) of X is defined as a function of w; € 92; by 


Kp(X)(w1) = > p(Z2 = w2|Z1 =u) X (wi, 02). (2.107) 


w2E S22 


Then, we define the inclusion map i from the set of real-valued random variables 
on §2, to the set of real-valued random variables on §2; x 22. That is, for a random 
variable Y on £2), the real-valued random variable i(Y) on §2; x {> is defined as 


i(Y)(w1, w2) = VY ), V1, w2) € 2; x 22. (2.108) 


To see the relation with the above defined inner product, we focus on an arbitrary 
real-valued random variable Y on {2;, which given as a function of Z;. Then, the 
conditional expectation «,,(X) of X satisfies 


(Y, p(X))© = D2 p(Zi Sw) ¥(w1) DS) p(Zo = w2/Z) = w1) X (wi, wr) 


WwW WE M29 


= D0 P@)X 1, w2) p(Z1 = wi, Zz = w2) = (ip (¥), XP. (2.109) 


WW 


In fact, when a real-valued random variable «,,(X) satisfies the condition (2.109) for 
an arbitrary real-valued random variable Y on 21, it is uniquely determined because 


2.2 Geometry of Probability Distribution Family 49 


the condition (2.109) guarantees that «,(X) is the image of X for the dual map of i 
with respect to the inner product (Y, X ae That is, when the linear space of random 
variables on §2; is regarded as a subspace of the linear space of random variables on 
Q, x 927 via the inclusion map i, the map K(X) is the projection from the linear 
space of random variables on {2; x S22 to the sub linear space of random variables on 
§2;. So, we can regard the condition (2.109) as another definition of the conditional 
expectation K(X) of X. That is, the conditional expectation K(X) of X is the real- 
valued random variable describing the behavior of the random variable X of the total 
system $2; x {22 in the subsystem (2. 

Generally, when we focus on a subspace LU of real-valued random variables for an 
arbitrary random variable X , we can define the conditional expectation Ky, ,(X) € U 
as 

(Y, ky p(X))OP = (¥,X)O, Wee wu. (2.110) 
This implies that the map ky, ,() is the projection from the space of all real-valued 
random variables to the subspace { with respect to the inner product (, ) p. 


Exercises 


2.33 Show that Cov,(X, Y) = 0 for real-valued random variables X and Y if they 
are independent. 


2.34 Let J be the Fisher information of a probability distribution family {p9|@ € 
©}. Let p% be the n-fold independent and identical distribution of pg. Show that the 
Fisher information of the probability distribution family {pj|0 € ©} at po is nJp. 


2.35 Prove (2.104) using the second equality in (2.17), and noting that /1 +x = 
1+ 5x = gx? for small x. 


2.36 Prove (2.105) following the steps below. 
(a) Show the following approximation with the limit « > 0. 


d log po(w) 1d log po(w) a 


log po+(w) — log pow) = 9° ta we 


(b) Prove the first equality in (2.105) using (a). 
(c) Show the following approximation with the limit « —> 0. 


dpww) 1d? po) 2 
dé 2 d*@ 


Po+e(w) = pow) + 


(d) Prove the second equality in (2.105) using (a) and (c). 


2.37 Prove (2.106) using the approximation (1+ x)’ = 1+sx+ 0a) x? for 
small x. 
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2.2.2 Bregman Divergence 


To discuss divergence from a more general viewpoint, we formulate Bregman diver- 
gence based on a general strictly convex function j1(@) on R. Assume that the strictly 
convex function ~(@) is twice-differentiable. Then, we define the Bregman diver- 
gence (canonical divergence) of j(@) as 


D" (610) := p'(8)(6 — 8) — wn) + n) 


0 
© max p'(6)(6 — 0) — (8) + (6) = i) uw" (0)(8 — 0)d0. (2.111) 
0 0 


Here (a) can be derived as follows. Since the inside function of the maximum is 
concave for 6, the maximum is realized when the derivative is zero, which implies 
that 6 = 0. Hence, we obtain (a). In this case, the convex function j1(@) is called the 
potential of the Bregman divergence. Further, when 6 > 0, the above maximum is 
replaced by max;.j. j. 

du 


Since the function ju is strictly convex, the correspondence 0 <> 7 = 4% is one- 


to-one. Hence, the divergence D“(6||@) can be expressed with the parameter 7). For 
this purpose, we define the Legendre transform v of j1 


y(n) = max 6 — (8). (2.112) 
6 
Then, the function v is a convex function™****, and we can recover the functions jz 
and @ as 
d 
(0) = max Oj —V(H), O=—. 
7 dn 
av 


Due to the inverse function theorem, the second derivative ( aie of v is calculated to 


do dn) _ Py! 
dn do ~ de 
: _ du 
In particular, when 7 = 75 (9), 


Vn) = On — (0) = D(6I\0) — (0), (2.113) 
(0) = On — vq) = DY (10) — v0). (2.114) 


Using these relations, we can obtain 
D#" G\\0) = D” (ll) = O(N — ) — VM) + 1H). (2.115) 


That is, the Bregman divergence of jz can be written by the Bregman divergence of 
the Legendre transform of ju. 
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Now, we extend Bregman to the multi-parametric case. Let ~(@) be a twice- 
differentiable and strictly convex function defined on a subset © of the d-dimensional 
real vector space IR?. The Bregman divergence concerning the convex function ju is 
defined by 


D"(H\10) = > mG) — 0%) — 1G) + u(8), mO) = oH <F(@). (2.116) 
k 
This quantity has the following two characterizations: 
LD On D\ (pk k A 
D" (60) = saps Aik (0)(6* — 6%) — (0) + (0) (2.117) 


1 
z -— py 
—_ k k 
=) Ve 6) (04 0) a 0 + (O— O)t)tdt . (2.118) 


kj 


Since the strict positivity of 4 implies the strict positivity of inside of the above 
integral, D!(6||0) is strictly positive unless 6 = 0. The strict positivity of ju is 
also guarantees that the correspondence 6° — 1 = oe is one-to-one. Hence, the 
Bregman divergence D” (6||@) can be expressed with the parameter 7). For this pur- 
pose, we define the Legendre transform v of 


v(n) = max Sn = (6). (2.119) 
k 


Then, the function v is a convex function®™*”**, and we can recover the functions 
and @ as 


OV 
6) =max ¥ O'R — v(m), OF = —. 2.120 
(8) = mi d ik — UH) ar (2.120) 


Due to the inverse function theorem, the second derivative matrix (se) j ofy 
z 
is calculated to (3 eS Ce ie = (4% £)x,j) |, which is the inverse of the 


0 00! 


Pp 
matrix aeaGT nar 


In particular, when m% = 24 (6), 
v(n) = D6 — nO) = D"(O||0) — 10), (2.121) 
k 


(8) = >_ Om — v(g) = D” (yI0) — v0). (2.122) 
k 


Using these relations, we can characterize the Bregman divergence concerning the 
convex function jz by the Bregman divergence concerning the convex function v as 
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D"G\|9) = D’ (qllf) = d O* (me — Tn) — UUM) + VGH) (2.123) 


Oey 
=[ Dao- 7 ()) (0; (8) — 112) 55 on, 5 (I + 9) — n(O))t)tdt, 


kj 
(2.124) 


where (2.124) follows from (2.118) for the Bregman divergence with respect to v. 
A subset € of © is called an exponential subfamily of © when there exist an 


element 6’ € © and/ independent vectors v1, ..., v7 € R¢ suchthat€ = {6 € 0/9 = 
+ YS alvji(a',...,a') € R'}. A subset M of @ is called a mixture subfamily 
of © when there exist a /-dimensional vector Cis — eeBH) and / independent vectors 
v},...,¥, € R@ such that M = {6 € Ol, = > 1 YbNi; (8)}. In particular, the set 
of vectors {v1,..., uj} is called a generator of € and M, respectively. 

Now, we focus on two points 0’ = Cie 64) and 6” = (6"',..., gy, We 
choose the ae subfamily € of © whose natural parameters 6'+!, ..., 04 
are fixed to 0” *',..., 0", and the mixture subfamily M of © whose expectation 
parameters 7!,..., 1! are Hed to 7(0’)!,...,7(0’)!. Let 6= Cr , 04) be an 


element of the intersection of these two subfamily of ©. That is, 0/ = = 6" / for i — 
1+1,...,dand7,;(@) = 7; (6) for j =1,...,1. 


Then, since 
6 — 6/) ifj S141 
ey - oN = ( eo Ae on : _ 2.12 
Nie ee aan 
the definition (2.116) implies that 
DM (6'||6") = ye 8") (8) — 166") + WOO") 
d . stat ~ ! or . ~ ~ 

= 107 — 6/0 G) — 2) + uG) + 3G! — 0” )nj ®) — HO) + uO") 

j=l j=l 
=D" (6'||9) + D"(6||9"). (2.126) 


Using (2.126), we obtain the Pythagorean theorem [2] as follows. 


Theorem 2.3 (Amari [6]) Given an element 6 € © and a mixture subfamily M of 

© with the generator {v,,..., vj}, we define 0* := argming. ,,D"(6'||@). Then, we 

obtain the following two items as Fig. 2.1. 

(1) Any element &’ € M satisfies D'(8'||\@) = D"(6'||0*) + D“(6*||0). 

(2) The element 0* is the unique element of the intersection of the mixture subfamily 
M and the exponential subfamily E containing 0 with the generator {v1, ..., vj}. 
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Fig. 2.1 Pythagorean 0 
theorem 
E 
exponential 
subfamily 


mixture subfamily 
0 


a’ , 
__ a 


# 


Proof Choose an element 6 in the intersection of the mixture subfamily M and 
the exponential subfamily € containing 6 with the generator {v,,..., uj}. Now, we 
choose additional vectors {v;41, ..., Ug} such that the set {v,,..., vg} forms a basis. 
Then, we introduce another coordinate a/ such that ae al vi = 6'. Now, we apply 
the new coordinate a/ to the relation (2.126). Thus, any element 6’ € M satisfies 
that D“(0"||0) = D(6'||0) + D“(6||9). Since D/(6\|0) > 0 except for 6’ = 0, we 
have ming D!(6'||9) = D“(6||@), which implies that 6* = 0, i.e., (2). Hence, we 
obtain (1). a 


We also have another version of the Pythagorean theorem as follows. 


Theorem 2.4 (Amari [6]) Given an element 6’ € © and an exponential subfamily 
E of © with the generator {v,,..., vj}, we define &, := argming.¢ D“ (6 ||@). 

(1) Any element 0 € E satisfies D'(6"\|@) = D"(6'||0,.) + D'(8,||8). 

(2) The element 6, is the unique element of the intersection of the exponential 
subfamily E and the mixture subfamily containing 6’ with the generator {v,,..., v}. 


Exercises 
2.38 Show that v(7) is a convex function. 


2.39 Solve Exercise 2.23 by using Theorem 2.3. 


2.2.3. Exponential Family and Divergence 


In Sect. 2.1, relative entropy D(p||q) is defined. In this subsection, we characterize 
it as Bregman divergence. 

Let p(w) be a probability distribution and X (w) be a real-valued random variable. 
When the family {pg|9 € ©} has the form 


pow) = pwc O-H® | (2.127) 
0) = log S* pwye*™™, (2.128) 
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the logarithmic derivative at respective points equals the logarithmic derivative at a 
fixed point with the addition of a constant. In this case, the family, X, and ~(@) are 
called an exponential family, the generator, and the cumulant generating function 
of X, respectively. In particular, in an exponential family, the logarithmic derivative 
does not depend on the point 6 except for constant differences. Hence, it is often called 
the exponential (e) representation of the derivative. Therefore, we use the superscript 
(e) in the inner product ( , io The function (0) is often called a potential function 
in the context of information geometry. Since the first derivative of /4(0) is calculated 
as p/(0) = (Fete H® = >-,, po(w)X (w), the second derivative is as 


- 2 
bu’ (0) = (5°) eo hO® _ ((5"”) en) 
2 
= > pow) X (wy = (x prix) = Ji G, 


is the Fisher information. So, the cumulant generating function j(@) is a strictly 
convex function. Therefore, the first derivative ju’(@) = >”, pe(w) X (w) is monotone 
increasing. That is, we may regard it as another parameter identifying the distribution 
Po, and denote it by 7. The original parameter 6 is called a natural parameter, 
and the other parameter 7) is an expectation parameter. When the distribution is 
parametrized by the expectation parameter 1, it is written as p,. Hence, we have 
Py) = Po- 

For example, in the one-trial binomial distribution, the generator X is given as 
X (i) = i, and the distribution po is given as po(i) = 5, fori = 0, 1. Then, the cumu- 
lant generating function 1 is calculated to be u(0) = log Let . The distribution is writ- 
ten as pp(0) = 1/U+ e*), Pe) = e/(1 + e°) in the natural parameter 0. Hence, 
the binomial distribution is an exponential family. The expectation parameter is 
n(0) = e°/(1 + e°). That is, the distribution is written as p,(1) = 7, p,(0) = 1-7 
in the expectation parameter 7. 

Since (9) is twice-differentiable and strictly convex, we can consider the Breg- 
man divergence of ,1(@). Then, the divergence D(pj|| po) can be written by using the 
Bregman divergence of j1(@) as follows. 


D( pall Pa) = DB, all Pn) = (8 — ONO) — w() + H(9) 


0 
=D" (6||0) = j J;(6 — 0)d0 = max(6 — 6)n(0) — (0) + w(O). (2.129) 
0 0 


where equations in (2.129) follow from (2.111). When 6> 0, the above maximum 
is replaced by maxj.4. 4. 

Next, we consider the multi-parameter case. Let X|(w),..., Xqg(w) be d real- 
valued random variables. We can define a d-parameter exponential family 


pow) = pweXO-H (6) F log \ pe") (2.130) 
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The parameters 6* are natural parameters, and the other parameters 


def OL 


m0) = ar = x po(w)X;(w) (2.131) 


. . . . Q? . . 
are expectation parameters. Since the second derivative oe is equal to the Fisher 


information matrix Jo.;,;, the cumulant generating function ji(8) is aconvex function. 
Using (2.118), we obtain 


1 
D(ppll Pe) = i > GO — 6) (6! — 6) Igy @_oyex,jtat (2.132) 
kj 


similar to (2.129). Since the second derivative matrix (5k, ; of y appearing in 
7 J 


Pu 


(2.124) is the inverse of the matrix aarpaT 


the application of (2.124) yields that 
1 
D(pallpo) = i YL — MOV O) = nO) gh eitdt, (2.133) 
oe 


where 6(t) is defined as 7(0(t)) = (0) + (n(0) — n(6))t. Note that the inverse 
matrix Jp, is the Fisher information matrix with respect to the parameter 7. 

In what follows, we consider the case where p is the uniform distribution Pyix. 
Let the real-valued random variables X;(w),..., Xq(w) be a basis of the space 
Ro (2) of random variables that have expectation 0 under the uniform distribution 
Pmix. We also choose the dual basis Y'(w),..., Y*(w) of the space Ro(S2) satis- 
fying >*, Y kWw)X jW)= 5. Then, any distribution p can be parameterized by the 
expectation parameter as 


Pw) = Pye) W) = Pmixw) + >> mY w) 


because p — Pmix can be regarded as an element of Ro({2). 
From (2.123) and (2.120), 


ae : a : : 

D(PrllPy) = DY’ ID = >> Om — ik) — u() +a), (2.134) 
k 

y(n) = D(Pyll Pmix) = —H (Py) + A(Pmix) (2.135) 


because (0) = 0. The second derivative matrix of v is the inverse of the second 
derivative matrix of ju, i.e., the Fisher information matrix concerning the natural 
parameter 6. That is, the second derivative matrix of 1 coincides with the Fisher 
information matrix concerning the expectation parameter 7. 

Now, for given distributions p and q, we consider the case when Y!(w) = q(w) — 
p(w). In this case, the distribution p, := (1 — t)p + tq (0 < t < 1) depends on the 
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first expectation parameter 7;. Other expectation parameters 7 are constants for 
the distribution p,. Hence, 7 (p;) — m (py) =t — t’ and m(p;) — (py) = O for 
k > 2. Thus, as a special case of (2.133), we have 


1 
D(pl\lq) = Jitdt , (2.136) 
0 


where J; is the Fisher information for the parameter f. 


2.3 Estimation in Classical Systems 


An important problem in mathematical statistics is the estimation of the parameter 
@ from some given data w € §2 for a probability distribution that generates the data. 
To solve this problem, a mapping 6 called an estimator from the probability space 
§2 to the parameter space © C R is required. The accuracy of the estimator is most 
commonly evaluated by the mean square error, which is the expectation of the 
square of the difference 6-0: 


Vo() = E>, ((6 — 9)?), (2.137) 


where 6@ is the true parameter. Note that sometimes the mean square error is not the 
same as the variance V,,(X). The estimator 


Ey(6) = E,,(6) =9, VWOE@ (2.138) 
is called an unbiased estimator, and such estimators form an important class of 


estimators. The mean square error of the unbiased estimator 6 satisfies the Cramér- 
Rao inequality 


Vo(6) = Jy? . (2.139) 


When an unbiased estimator attains the RHS of (2.139), it is called efficient. This 
inequality can be proved from the relations 


dE, (6 — 69) 


(6 — 8), lo) = 


and 


2 


(— 60), (B= 60) Maya). = CO — 8), 105) [ = 1, (2.140) 


Po Po — Pb 
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which follows from Schwarz’s inequality. The equality of (2.139) holds for every 
value of 6 if and only if the probability distribution family is a one-parameter expo- 
nential family (2.127) and the expectation parameter 7(9) = >", X (w) pow) is to 
be estimated. In this case, the efficient estimator for the expected parameter is given 
as 7)(W) := X(w) (Exercise 2.40). Even in the estimation for an exponential family, 
there is necessarily no estimator for the natural parameter @ in (2.127) such that the 
equality of (2.139) holds for all 6. 

Let n data w” = (wW1,..., W,) € 82” be generated with the n-i.i.d. of the proba- 
bility distribution pg. The estimator may then be given by the mapping 6" from 2" 
to © CR. In this case, the Fisher information of the probability distribution family 
is nJg, and the unbiased estimator 6” satisfies the Cramér-Rao inequality 


A 1 
Vee") = J, 
n 


However, in general, it is not necessary to restrict our estimator to unbiased estima- 
tors. In fact, rare estimators satisfy such conditions for finite n. 

Therefore, in mathematical statistics, we often study problems in the asymptotic 
limit — oo rather than those with a finite number of data elements. For this purpose, 
let us apply the asymptotic unbiasedness conditions 


im d a 

lim Eg(6,) = 9, lim —-E,(0,)=1, VOeO (2.141) 

n>oo n>oo d@ 
to a sequence of estimators (6"}. Evaluating the accuracy with lim nV» (On), we have 
the asymptotic Cramér-Rao inequality*: 

limnV6(6,) > Jy’, (2.142) 

which is shown as follows. Based on a derivation similar to (2.139), we obtain 
2 


" eo 
nJoVo(On) = edn (2.143) 


Combination of (2.141) and (2.143) derives Inequality (2.142). 
Now, we consider what estimator attains the lower bound of (2.142). The maxi- 
mum likelihood estimator 0,, 7, (w”) 


bn. (w") = argmax p?(w") (2.144) 
0€0 


8This inequality still holds even if the asymptotic unbiasedness condition is replaced by another 
weak condition. Indeed, it is a problem to choose a suitable condition to be assumed for the inequality 
(2.142). For details, see van der Vaart [7]. 


58 2 Information Quantities and Parameter Estimation in Classical Systems 


achieves this lower bound, and the limit of its mean squared error is equal to Jy ; 
[7]. Indeed, in an exponential family with the expectation parameter, the maximum 
likelihood estimator is equal to the efficient estimator™**'. Hence, the maximum 
likelihood estimator plays an important role in statistical inference.° 

Indeed, we choose the mean square error as the criterion of estimation error 
because (1) its mathematical treatment is easy and (2) in the i.i.d. case, the sample 
mean can be characterized by a Gaussian distribution. Hence, we can expect that a 
suitable estimator will also approach a Gaussian distribution asymptotically. That is, 
we can expect that its asymptotic behavior will be characterizable by the variance. In 
particular, the maximum likelihood estimator 6, mv Obeys the Gaussian distribution 
asymptotically: 


b 
pila < J/n(@n.uz — 9) <b} > / Po jm(x)dx, Wa,b. 


Let us now consider the probability distribution family {p9|9 € © C R“} with 
multiple parameters. We focus on the Fisher information matrix Jg = (Jg-x,;), which 
was defined at the end of Sect. 2.2.1, instead of the Fisher information. The estimator 
is given by the map = 6}, 4 ) from the probability space 2 to the parameter 
space ©, similar to the one-parameter case. The unbiasedness conditions are 


EK(6) & E,,(65) =0', WOE O,1< Vk <d. 


The error can be calculated using the mean square error matrix V 4 (6) = ae (6): 


Vi! (6) = Ep, ((O — 6) (64 — 64)). 


Then, we obtain the multiparameter Cramér—Rao inequality 


V6) > Jj’. (2.145) 

Proof of (2.145) For the proof, let us assume that any vectors |b) = (b,...,bg)" € 
C4 and |a) € C4 satisfy 

(b|Vo(6)b) (al Jola) = |(bla)/?. (2.146) 


By substituting a = (J)~'b, inequality (2.146) becomes 


(b|V9(6)|b) > (b\(J9)~ |b) 


° This is generally true for all probability distribution families, although some regularity conditions 
must be imposed. For example, consider the case in which §2 consists of finite elements. These reg- 
ularity conditions are satisfied when the first and second derivatives with respect to @ are continuous. 
Generally, the central limit theorem is used in the proof [7]. 


2.3 Estimation in Classical Systems 59 


since (Jg)~! is a symmetric matrix. Therefore, we obtain (2.145) if (2.146) holds. 
Now, we prove (2.146) as follows. Since 


dE} () — 6 


= age 


ae ; \(e) 
= (Io, 6 — 0)) 
0=0o . 


similarly to the proof of (2.139), the Schwarz inequality yields 


d (e) 
(b|V 9, (0)b) = (xe - ayn) (So = 9) 
k=1 Poy 


_[((Rertases) «(Zhe ~ Ay) )” _ Maloy 
(Sofa lata) (Sy. Taxa) (a| Jo 1a) 


Moreover, since the sequence of estimators {0, = (01, ... 
totic unbiasedness condition 


64)} satisfies the asymp- 


anh 


7 fo) n 
1 k — ok 1 — k — k 6) 
im E,(6,) =0, im. A0i E,(@,) =6;, V0eO, (2.147) 
the asymptotic Cramér—Rao inequality for the multiparameter case 
Vo(tOn}) = Jo" (2.148) 


holds if the limit Vg (6, }) = lim, +0 nVo (6,) exists. Next, we prove (2.148). Defin- 


ing Al. a a Ej (6,), we have 


n(a| Fela) (b| Vo Gn)1b) = [al And)? 
instead of (2.146). We then obtain 
(a| Jola)(b|Vo({8n})1b) = l(alb)?, 
from which (2.148) may be obtained in a manner similar to (2.145). 
Similarly to the one-parameter case, the equality of (2.145) holds if and only if the 
following conditions hold: (1) The probability distribution family is a multiparameter 


exponential family. (2) The expectation parameter 7 is to be estimated. (3) The 
estimator for 7 is given by 


ThK(w) = Xx). (2.149) 
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In this case, this estimator (2.149) equals the maximum likelihood estimator by, ML = 
AG ane ee Cue defined by (2.144)***"', Le., 


max Py) = Pxwy). (2.150) 


A probability distribution family does not necessarily have such an estimator; how- 
ever, a maximum likelihood estimator 6, mz can be defined by (2.144). This satisfies 
the asymptotic unbiasedness property (2.147) in a similar way to (2.144), and it sat- 
isfies the equality of (2.148). Moreover, it is known that the maximum likelihood 
estimator 6, mz Satisfies [7] 


Vo({On}) = Jq!- 


Note that this inequality holds independently of the choice of coordinate. Hence, for 
a large amount of data, it is best to use the maximum likelihood estimator. Its mean 
square error matrix is almost in inverse proportion to the number of observations 
n. This coefficient of the optimal case is given by the Fisher information matrix. 
Therefore, the Fisher information matrix can be considered to yield the best accuracy 
of an estimator. 

Indeed, usually any statistical decision with the given probability distribution fam- 
ily {qy|y € I’} is based on the likelihood ratio log g,(w) — log qg,(w). For example, 
the maximum likelihood estimator depends only on the likelihood ratio. A proba- 
bility distribution family {¢,|y € I”} is called a curved exponential family when it 
belongs to a larger multiparameter exponential family {pg|@ € O}, ie., q, is given 
as pe) With use of a function #(y). When p(w) is given by (2.130), the likelihood 
ratio can be expressed by the relative entropy 


log q,(w) — log gy (w) = log pow (w) — log poy) 
= SO) = O/) Xe) — wO)) + HOC) 
k 


= 7 XW)" = 00/)) + HO) — HO") 
k 


- (x X.(w) (6 — 009)) + (OO) — Ho") 


k 
=D(Pxwylldy) — DPxwllay), (2.151) 
where 6” is chosen as 7%(6”) = X;(w). That is, our estimation procedure can be 
treated from the viewpoint of the relative entropy geometry. 
Exercises 


2.40 Show that the following two conditions are equivalent for a probability distri- 
bution family {po|9 € R} and its estimator X by following the steps below. 


@® There exists a parameter 7 such that the estimator X is an unbiased estimator for 
the parameter 77 and the equality of (2.139) holds at all points. 
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@ The probability distribution family {p9|0 € R} is an exponential family, p(w) 
is given by (2.127) using X, and the parameter to be estimated is the expectation 
parameter 7)(@). 


(a) Show that the estimator X is an unbiased estimator of the expectation parameter 
under the exponential family (2.127). 

(b) Show that © may be deduced from @. 

(c) For the exponential family (2.127), show that the natural parameter @ is given as 
a function of the expectation parameter 7) with the form @ = i Jy dy. 

(d) Show that (0(7)) = fo! 1 Jy dry’. 

(e) Show that # = X — nif Mis true. 


(f) Show that Pa = J,(X — 7) py if Dis true. 
(g) Show that @ is true if @ is true. 


2.41 Show equation (2.150) from (2.151). 


2.42 Consider the probability distribution family {p9|9 € R} in the probability space 
{1,..., 2} and the stochastic transition matrix Q = (Q'). Let the Fisher information 
of pg, in the probability distribution family {p9|0 € R} be Jy,. Let Jo, be the Fisher 
information of Q()g9,) in the probability distribution family {Q(pp)|0 € R}. Show 
then that Jo, > J,,. This inequality is called the monotonicity of the Fisher infor- 
mation. Similarly, define Jo,, J % for the multiple variable case, and show that the 
matrix inequality J, > J‘, holds. 


2.4 Type Method and Large Deviation Evaluation 


In this section, we analyze the case of a sufficiently large number of data by using 
the following two methods. The first method involves an analysis based on empirical 
distributions, and it is called the type method. In the second method, we consider a 
particular random variable and examine its exponential behavior. 


2.4.1 Type Method and Sanov’s Theorem 


Let n data be generated according to a probability distribution in a finite set of events 
Na = {1,...,d}. Then, we can perform the following analysis by examining the 
empirical distribution of the data [8]. Let 7,, be the set of empirical distributions 
obtained from n observations. We call each element of this set a type. For each type 
q € Tn, let the subset T,’ C Nj, be a set of data with the empirical distribution q. 
Since the probability p”(¢) depends only on the type g for each i € T,', we can 
denote this probability by p”(q). Then, when the n data are generated according to 
the probability distribution p”, the empirical distribution matches qg € T,, with the 


probability p"(T,;") (= Dee DP” (i)). 
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Theorem 2.5 Any type p € T, and any datai € T;' satisfy the following: 
pd; = Ps) s (2.152) 
p" (i) = eo MA@+ Dally). (2.153) 


Denoting the number of elements of T, and T;' by \T,| and |T7'|, respectively, we 
obtain the relations 


_ ni d-1 
isa sea: (2.154) 
myles ngl 
1 

aoe < [7 < et) | (2.155) 

n 
awe < p(T) < et PallP) (2.156) 
n 


Proof Let p(i) = " and q(i) = ". Then, 
n! is 
p'(T,) = IT, [Tn ‘= aoa lleo ‘5 


p(T) = IT; Il pi" 


i=1 


ll 
= 
S 
le 
i 
S 
= 
4 


Using the inequality” 2.43 


n! 
—<n"", (2.157) 
m! 
we have 
nq” HI F d / \ nn; 
a a2 = (Sar) > I] (ur (“)’ ") 
Pp (7h) n;! ae n 


n 

ryt \ Zhan 
(<Q 

n n 


Therefore, inequality (2.152) holds. Fort € T;', we have 


d 
i=l 

d 
i=1 


d d nl 
pi) = p@" =] pa) 
i=l i=l 


=I nlog pi)(4) _ gn Xt, a@log pw) — g-n(H(@)+D(alip)) 


ini 


which implies (2.153). 
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Each element q of T,, may be written as ad-dimensional vector. Each component of 
the vector then assumes one of the following n + 1 values: 0, 1/n,...,n/n. Since 
pent qi = 1, the dth element is decided by the other d — 1 elements. Therefore, 
inequality (2.154) follows from a combinatorial observation. Applying inequality 
(2.153) to the case p = q, we have the relation p"(Tj') = e~"" |T"|. Since 1 = 
pare p" ) > p” (T7) for p € T,,, we obtain the inequality on the RHS of (2.155). 
Conversely, inequality (2.152) yields that 1 = Dever, pun) = > ger, P'(T) = 
e MHP) i ||7,,|. Combining this relation with (2.154), we obtain the inequality on 
the LHS of (2.155). Inequality (2.156) may be obtained by combining (2.153) and 
(2.155). 173) 


We obtain Sanov’s Theorem using these inequalities. 


Theorem 2.6 (Sanov [9]) The following holds for a subset R of distributions on 
Na: 


= bt D = n U Te 
exp(—n_ min, D(q\lp)) < p" (Ugeror,T}) 


n 


(n + 1)4 
< (n+ 1) exp(—n inf D(q||p)). 
qEeR 


In particular, when the closure of the interior of R coincides with the closure of R,'° 


b 1 n n BA 
lim —— log p" (Ugernr, T;') = inf Dalp) 


n>o n 


in the limitn > o. 


Based on this theorem, we can analyze how different the true distribution is from 
the empirical distribution. More precisely, the empirical distribution belongs to the 
neighborhood of the true distribution with a sufficiently large probability, i-e., the 
probability of its complementary event approaches 0 exponentially. This exponent is 
then given by the relative entropy. The discussion of this exponent is called a large 
deviation evaluation. 

However, it is difficult to consider a quantum extension of Sanov’s theorem. This 
is because we cannot necessarily take the common eigenvectors for plural densities. 
That is, this problem must be treated independently of the choice of basis. One pos- 
sible way to fulfill this requirement is the group representation method. If we use 
this method, it is possible to treat the eigenvalues of density of the system instead 
of the classical probabilities [10, 11]. Since eigenvalues do not identify the density 
matrix, they cannot be regarded as the complete quantum extension of Sanov’s the- 
orem. Indeed, a quantum extension is available if we focus only on two densities; 
however, it should be regarded as the quantum extension of Stein’s lemma given in 


!0The set is called the interior of a set X when it consists of the elements of X without its boundary. 
For example, for a one-dimensional set, the interior of [0, 0.5] U {0.7} is (0, 0.5) and the closure of 
the interior is [0, 0.5]. Therefore, the condition is not satisfied in this case. 
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Sect. 3.5. Since the data are not given without our operation in the quantum case, it 
is impossible to directly extend Sanov’s theorem to the quantum case. 

In fact, the advantage of using the type method is the universality in information 
theory [8]. However, if we apply the type method to quantum systems independently 
of the basis, the universality is not available in the quantum case. A group repre- 
sentation method is very effective for a treatment independent of basis [10, 12-17]. 
Indeed, several universal protocols have been obtained by this method. 


Exercise 


2.43 Prove (2.157) by considering the cases n > m andn < m separately. 


2.4.2. Cramér Theorem and Its Application to Estimation 


Next, we consider the asymptotic behavior of a random variable in the case of inde- 
pendent and identical trials of the probability distribution p. 

For this purpose, we first introduce two fundamental inequalities™*“. The 
Markov inequality states that for a real-valued random variable X where X > 0, 


HP) 5. (x20). (2.158) 
Cc 


Applying the Markov inequality to the variable |X — E,(X)|, we obtain the Cheby- 
shev inequality: 


V(X) 
PIX — E,(X)| = a} < eS (2.159) 
Now, consider the real-valued random variable 
det 5.1 
x*= —X;, 2.160 
2d : (2.160) 
where X1,..., X, aren independent random variables that are identical to the real- 


valued random variable X subject to the distribution p. When the variable X” obeys 
the independent and identical distribution p” of p, the expectation of X” coincides 
with the expectation E,(X). Let V,(X) be the variance of X. Then, its variance with 
n observations equals V ,(X)/n. 

Applying Chebyshev’s inequality (2.159), we have 


n n V(X) 


for arbitrary « > 0. This inequality yields the (weak) law of large numbers 
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p'{|X" —E,(X)|=>eg7>0, Ve >0. (2.161) 


In general, if a sequence of pairs {(X”, p,)} of a real-valued random variable and a 
probability distribution satisfies 


pr{|X" —x]>e} > 0, Ve >0 (2.162) 


for a real number x, then the real-valued random variable X” is said to converge in 
probability to x. 

Since the left-hand side (LHS) of (2.161) converges to 0, the next focus is the 
speed of this convergence. Usually, this convergence is exponential. The exponent 
of this convergence is characterized by Cramér’s Theorem below. 


Theorem 2.7 (Cramér [18]) Define the cumulant generating function (0) “ 


log (>, pw)e*™). Then 


1 

lim —— log p"{X" = x} 2 max (8x — ()) , (2.163) 
n = 

rr. 1 n n : / 

lim —— log p"{X" > x} < lim max (0x' — (0)) , (2.164) 
1 

lim —— log p"{X" <= x} = max (Ox — (8) (2.165) 
n = 

| 

lim —— log p"{X” < x} < lim max (0x" - 1(9)) . (2.166) 
n x’>x—-0 0<0 


If we replace {X" > x} and {X” < x} with {X” > x} and {X" < x}, respectively, 
the same inequalities hold. 


When the probability space consists of finite elements, the function maxgs9 (0x— 
(8)) is continuous, 1.e., lim, —,.+9 Maxgso (Ox' _ i(8)) = maxg>o (Ox — 11(9)). 
Hence, the equality of (2.163) holds. Conversely, if the probability space contains 
an infinite number of elements as the set of real numbers R, we should treat the dif- 
ference between the RHS and LHS more carefully. Further, the inequality of (2.163) 
holds without limit, and is equivalent to (2.46) when we replace the real-valued 
random variable X (w) with — log g(w). The same argument holds for (2.165). 


Proof Ynequality (2.165) is obtained by considering —X in (2.163). Therefore, we 
prove only (2.163). Inequality (2.166) is also obtained by considering — X in (2.164). 
Here we prove only inequality (2.163). Inequality (2.164) will be proved at the end 
of this section. 

For a real-valued random variable X with X (w) for each w, 


Ep (e"™") = Ea (I a) =e yee. (2.167) 


i=1 
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Using the Markov inequality (2.158), we obtain 


en(0) 


pias apie =o") < = e"HO-® for9>0. (2.168) 


en Ox 


Taking the logarithm of both sides, we have 
1 
—-— log p"{X" > x} => Ox — pO). 
n 


Let us take the maximum on the RHS with respect to 6 > 0 and then take the limit 
on the LHS. We obtain inequality (2.163). a 


This theorem can be extended to the non-i.i.d. case as the Gartner-Ellis theorem. 


Theorem 2.8 (Gartner [19], Ellis [20]) Let {p,} be a general sequence of the proba- 
bilities with the real-valued random variables X,,. Define the cumulant generating 


functions /1,(0) = 4 log (X,, pn(w)e™**™) and (0) © limy+co ftn(0) and the 
set G © {y'(0)|0}. Then 


1 

lim —— log pp{X, => x} = mak (6x — u(O)), (2.169) 
n > 

=: I 

lim —— log pp{X, =x} <_ inf max (6x — u(O)), (2.170) 
n XEG:X>x 0>0 
1 

lim —— log pu{Xn <x} > max (6x — (0), (2.171) 
n < 

— | 

lim —— log pa{Xn <x} <_ inf max (0x — (8)). (2.172) 
n xEG:X<x O<0 


If we replace {X,, => x} and {X, < x} by {X, > x} and {X, < x}, respectively, the 
same inequalities hold. 


Inequalities (2.169) and (2.171) can be proved in a similar way to Theorem 2.7. 
Next, we apply large deviation arguments to estimation theory. Our arguments will 
focus not on the mean square error but on the decreasing rate of the probability that the 
estimated parameter does not belong to the e-neighborhood of the true parameter. To 
treat the accuracy of a sequence of estimators {6} with a one-parameter probability 
distribution family {p9|@ € R} from the viewpoint of a large deviation, we define 


x ame 1 ~ 
({0,}, 0, €) = lim —— log pi{l8, — 41 > €}, (2.173) 
n 


a({,}, 0) = lim as 


(2.174) 


€ 


0 


As an approximation, we have 
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n on 
Pp{lAn = | > e} = e's (On) A) 


Hence, an estimator functions better when it has larger values of @ ({6,}, 0, €) and 


a({6n}, 9). 


Theorem 2.9 (Bahadur [21—23]) Let a sequence of estimators {6,} satisfy the weak 
consistency condition 


pi{|d, —0| >} +0, Ve >0, VOeER. (2.175) 


Then, it follows that 


BUGn}, 8,6 = ,,, inf D(poll po). (2.176) 
Further, if 
D(pa'|lpo) = jim D(p9llPo)» (2.177) 


the following also holds: 
- 1 
a({Fn}, 8) < 5 Jo. (2.178) 


If the probability space consists of finite elements, condition (2.177) holds. 


Proof of Theorem 2.9 Inequality (2.178) is obtained by combining (2.176) with 
(2.105). Inequality (2.176) may be derived from monotonicity (2.13) as follows. 
From the consistency condition (2.175), the sequence a, = PIIOn — 0| > e€} sat- 
isfies a, — 0. Assume that ¢’ = |0 — 6’| > e. Then, when 6, —6'| < e —€, we 
have 6, — 6| > e. Hence, the other sequence b,, = PRAIAn -O| >e = prt|On _ 
0'| < € — «} satisfies b, — 1 because of the consistency condition (2.175). Thus, 
monotonicity (2.13) implies that 


D( py ll Po) = by log bn i log ay) + el =~ b,) dog ~ bn) = log — y)). 


Since nD(pe'|| po) = D( pj || pg) follows from (2.28) and —(1 — b,) log(l — an) = 
0, we have nD(pe'|| po) = —h(bn) — by log ay, and therefore 
= Piwalia) bn) 


2.179 
ss, a ( ) 


1 
——loga, 
n 


As the convergence h(b,) — 0 follows from the convergence b, — 1, we have 


B({0n}, 0, €) < D(po'l|po). 
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Considering infy.|g9; ><, we obtain (2.176). In addition, this proof is valid even if 
we replace {|4, — 6| > e} in (2.173) by {|6, — 6] = €}. x 


If no estimator satisfies the equalities in inequalities (2.176) and (2.178), these 
inequalities are not sufficiently useful. The following proposition gives a sufficient 
condition for the equalities of (2.176) and (2.178). 


Proposition 2.1 Suppose that the probability distribution family (2.127) is exponen- 
tial, and the parameter to be estimated is an expectation parameter. If a sequence of 
estimators is given by X"(w") (see (2.160)), then the equality of (2.176) holds. The 
equality of (2.178) also holds. 


It is known that the maximum likelihood estimator 6, my satisfies (2.178) if the 
probability distribution family satisfies some regularity conditions [23, 24]. 


Proof of Proposition 2.1 and (2.164) and (2.166) in Theorem 2.7 Now, we prove 
Proposition 2.1 and its related formulas ((2.163) and (2.164) in Theorem 2.7) 
as follows. Because (2.129) implies maxys9(@’ — #)(n(@) + ©) — (u(8’) — w(A)) = 
D(Px(0)+ell Pn@)), Proposition 2.1 follows from the inequalities 


Ds 
im — log Pio (X" w") > (0) + 6} 
= max(6" — 8)(n(6) + ©) — (H(6) — H(9), (2.180) 


71 an ny n F " . 
lim —— log Pra {X"(w") > 10) +e < lim DPyesellPy@) (2.181) 
0 


eet 


for the expectation parameter 77 of the exponential family (2.127) and arbitrary « > 0. 
Whenx = 7(0) +€= n(0) > Oandé@ = 0, the formula (2.181) is the same as (2.164) 
in Theorem 2.7 with replacing > by > in the LHS because D(p,(0) +l Poy) = 
67(0) — (8) = maxy 67(0) — (8). Since the LHS of (2.181) is not smaller than 
the LHS of (2.164) in this correspondence, (2.181) yields (2.164). Considering — X 
instead of X, (2.164) implies (2.166). 

To show (2.180), we choose arbitrary € > € and 6 such that m1 (6) = 7(0@) + €. 
Based on the proof of (2.163) in Theorem 2.7, since the expectation of e”(?—9*"@") 
under the distribution p7 is e"°)-“, we can show that 


1 
ar log pg{X"(w") > (A) + e} 
= max (6" — A (n(8) + ©) — (uO) — W(A)), (2.182) 


1 
—— log p3{X"(w") < (0) + 


= max (6 — 8) (n(8) + €) — ((8') — 2(9)) = DBPro) +ell Byoy42) > 0. (2.183) 
Then, (2.182) implies (2.180). 

Next, using (2.183), we show (2.181) as follows. According to a discussion similar 
to the proof of (2.176) in Theorem 2.9, we have 
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D(Pnoy+e ll Pn) ae h(bn) 


1 ad i n n 
—— log pyoy(X"W") > (0) + ‘s = 


(2.184) 


for «’ > €, where b, = = : pe (+e {X"(w") > (0) + €}. From (2.183), b, > 1. Hence, 
we obtain the last inequality i in (2.181). a 


Proof of (2.170) and (2.172) in Theorem 2.8 Finally, we will prove inequality (2.170) 
in Theorem 2.8, i.e., we will prove that 


lim ~~ log Pn{Xn(w) = x} < max (941'(8) — (6) (2.185) 


for any 6 satisfying (6) > x. Inequality (2.172) can be shown in the same way. 


Define the exponential family py.9(w) = Pn (wye"Xn)—1n( | Similarly to (2.184), 
we have 


D(P,, All Pn,o) be h(bn) 
nby nby, ” 


1 
aw log Pn,otXn (w) >x}< 
n 


where b, © p, j{Xn(w) > x}. From (2.129), 272 — maxgso (1/,() — Jin(0))- 
Hence, if we show that b, — 1, we obtain (2.185). To show that b, — 1, similarly 
to (2.183), the inequality 


1 = = 
—~ log py,9{Xn(w) <x} 2 max(0 — B)x — WO) + HO) 
0<0 


holds. Since the set of differentiable points of 1, is open and p’ is monotone increasing 
and continuous in this set, there exists a point 0’ in this set such that 


/ A aa 
<6, x<p@). 
Since py’ is monotone increasing, we obtain 


max(6 — A)x — (0) + w() > (6 — Ox — WO’) + WA) 


>(u'(0) — x) - 6) > 0, 
which implies that b, — 1. a 
Exercises 
2.44 Prove Markov’s inequality by using the inequality >) ;...... PiXi = C Doj-x,>e Pi- 


2.45 Using Cramér’s theorem and (2.42) and (2.44), show the following equations 
below. Show analogous formulas for (2.46), (2.47), (3.5), and (3.6). 
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1 
lim —-— log p"{pj, < e "®) = — min(y(s) — sR), (2.186) 
n>oo Nn O<s 

1 
lim —— log p"{pj, > e "Ry = min(?(s) — sR). (2.187) 
n>oo Nn cS 


2.46 Show that 


1 —sR 
ime ire ham (2.188) 
n>o n O<s<1 l-s 


by first proving (2.189) and then combining this with (2.55). The > part may be 
obtained directly from (2.51) 


P°(p", er) S max (I7;"| _ eM Ry om H(P)+H (lla) 


qeTy:|Tz| >e"® 
en (a) 
> max — eR e MA(P)+A (pla) 
ety: 229 or \(n + 1)4 
n+ 1 donR 
= max =e "(Pla | — ee ; (2.189) 
HO) nk ert 
q€Tn: Capt >e' 
2.47 Show that 
: 1 . s)—sR 
lim —— log P(p", e”*®) = — min w= se (2.190) 
noo on sso l-—s 


by first proving (2.191) and then combining this with (2.55). The inequality > may 
be obtained directly from (2.54) 


eo? Pallp) 


P(p",e"*)> max p"(T/) > (2.191) 


max a are 
get: Tn |<e"® qeT:H(qy<R (n+ 1)4 


2.48 Consider the case where £2, = {0,1}, prx(0) =e", pnd) =1l—-—e™, 
X,(0) =a, X,(1) = —b witha, b > 0. Show that u(0) = — min{(1 — 8)a, 0b} and 
the following for —b < x <a: 


a(x +b) 


1 
<a, lim —logp,{X, >x}=a. 
ae OY eo ee { x}=a 


max (x6 — #(8)) = 


It gives a counterexample of Gartner—Ellis Theorem in the nondifferentiable case. 
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2.5 Continuity and Axiomatic Approach 


In this section, we consider how to characterize the entropy H (p) by axioms. Indeed, 
when a real-value function S satisfies several axiomatic rules, the function S must be 
the entropy H(p) given in (2.2). Here, we consider the following five axioms for a 
real-value function S for distribution, which is close to the axioms by Khinchin [25]. 


K1 (Normalization) 


S(Pmix,{0,1}) = log 2. (2.192) 
K2 (Continuity) Sis continuous on P({0, 1}). 
K3 (Nonnegativity) S$ is nonnegative. 
K4 (Expandability) For any function f, we have 

S(Px) = S(Prixx).- (2.193) 


K5 (Chain rule) When Pxy is a joint distribution for X and Y, the marginal dis- 
tribution Py and the conditional distribution Py,y—- satisfies that 


S(Pxy) = S(Px) + SS Px(x)S(Pyix=2)- (2.194) 
Here, we consider another set of axioms as follows. 
A1 (Normalization) 
S(Pmix,{0,1}) = log 2. (2.195) 
A2 (Weak additivity) 
S(p") = nS(p) (2.196) 
A3 (Monotonicity) For any function f, we have 
S(Px) = S(P px). (2.197) 


A4 (Asymptotic continuity) Let p, and q, be distributions on the set {0, 1}”. 
When di (Pn; Gn) > 0, we have 


|S(Pn) a S(qn)| 


n 


> 0. (2.198) 


Then, the following theorem shows the uniqueness of a function satisfying one 
of the above sets of axioms. 
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Theorem 2.10 For a function S defined on the set of distributions, the following 
three conditions are equivalent. 


(1) S satisfies Axioms K1-K5. 
(2) S satisfies Axioms A1-A4. 


(3) S(p) = — Di; pi log pi. 


Before proceeding to the proof of Theorem 2.10, we consider the asymptotic 
convertibility for the independent and identical distribution. 


Lemma 2.5 For a distribution p on 82 and an arbitrary real number € > 0, there 
exists asequence of maps f, from 2" to Qy = {0, 1y!4)—9"/ 182] such that d\(p" o 
ae Pmix,@,) => 0. 


Lemma 2.6 For a distribution p on 82 and an arbitrary real number € > 0, there 
exists a sequence of maps fy from 9! := {0, 1}4()+9n/l82) tg 2" such that 
d\(p", Pmix,2, © f, |) > 0. 


These two lemmas show that the entropy H(p) gives the asymptotic conversion 
rate between the independent and identical distribution and the uniform distribu- 
tion. Rényi entropy H\,;(p) also satisfies Axioms K1-K4 and A1-A3. However, 
it does not satisfies K5 (Chain rule) or A4 (Asymptotic continuity) *”*°. Indeed, 
although the quantity e~”) satisfies A4 (Asymptotic continuity)™*?*' as well as A3 
(Monotonicity), it does not satisfy A2 (Weak additivity). Only the information quan- 
tity satisfying Axioms K1-K5 or A1-A4 gives the asymptotic conversion between 
the independent and identical distribution and the uniform distribution. Hence, we 
can conclude that K5 (Chain rule) and A4 (Asymptotic continuity) are crucial for 
the asymptotic conversion. 


Proof of Theorem 2.10 First, we show (1) => (2). A2 (Weak additivity) follows 
from K5 (Chain rule). A3 (Monotonicity) follows from K3 (Nonnegativity), K4 
(Expandability), and K5 (Chain rule) by the same discussion as (2.6). 

Now, we start to show A4 (Asymptotic continuity). Since the set P({0, 1}) is 
compact, due to K2 (Continuity), S is uniformly continuous on P({0, 1}). So, there 
exists the maximum value R := maxpepio,1}) S(p). For any € > 0, we choose 6 > 0 
such that |S(p) — S(q)| < € for any d;(p, q) < 6. Consider two distributions PY, 
and Be on the set {0, 1}” such that 5, := 2d, (Pe) goes to zero as n > oo. 
Then, we can choose a sufficiently large integer N such that 6, < S forn > N. 

Here, X; denotes the random variable on the i-th set {0, 1} in {0, 1}” and X,, := 
(X1,..., X,). For any integer i < n, we have 


> 


Xi-1 


Py, (41-1) — Py, i-1)] < bn. 


Also, for any value x; € {0, 1}, we have 
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n n / p” / 
Pk. (-0)[ Phin ae (X)) — Pxyx, axe O) 


Xi-1 


rw /} wi] 
Ss > Pk a) Pitas (x;) ~~ Py,, i-DP xx =x) (xj) 


Xi-1 


+ Pi i) — Py, Pauses s OD 


Xi-1 
e > Py (xi) — Py (xi)| + by Py, ,i-1) — Py, 2) 
Xj Xj-1 
<On =P On = 26n- (2.199) 


We define the function Y,/(x;-1) ‘= |Py.1y, j=x,_, (xt) — Pretec (x;)|. Apply- 
ing Markov inequality to the random variable Y,:(X;-1), from (2.199), we have the 
inequality 


nN n ! Dp” ! 26n 
PX, (%i-11 [Pkpx,_ =x, 7) — Prax, =x.) = 6) = 1- 5 (2.200) 


Let §2; be the set of x;_; = (%1, ...x;-1) Satisfying the condition inside of the paren- 
thesis in the LHS of (2.200). Then, K3 (Nonnegativity) implies that 


> Pk G)| SPH ux sen.) — SPrix sen) 


Xi-1 


= = Ph, C1) | SP sae) — SP yaar) 


Xj-1€; 


+ ~ se (211)|SP% px, sais) — SP ya1n-1)| 


Xj-1€ QF 


= PS Py,_,@i-et+ >» Py, (4i-1)R 


Xj-1E€Q} xXj-1E€2F 


On 
Sepak <e+te=2e. (2.201) 
Also, K3 (Nonnegativity) implies that 


> Ph. i — Py, 1) |/SPrxenekiienn) 
Xi-1 
= ") 
< > )/Pi_ Gi) - Py, @iv|R sR = 5. (2.202) 


Xi-1 


On the other hand, K5 (Chain rule) implies that 
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SP = S  g e (2.203) 


i=l Xj-1 


Thus, we have 


JS Px cake) — SB xs .uXs)| 

(a) “ n n 0 =n 

= > > Px. i) SP x,1x,_:=x;-1) ~ Px. (SP yi aa) 
i=l X;-1 


2D Pe...) SP hx, sad) — Pk, GSP rx, 0-0)| 


i=1 j—] 


+ [Pk DSP yx. sas) — Pr, iS Prue, san 


= >. >, Py (xj-1) | SPk1x,1=x11) = SP xix, je, ) 


i=1 Xj-1 
+ [PRD — Py, G-0|SP yx, 2.) 
Or ") 
<>) 2+ 7 =n(Qe+ 


i=1 


€6 
5 y, 

where (a) follows from (2.203), and (b) follows from (2.201) and (2.202). Hence, 
A4 (Asymptotic continuity) holds. 

Next, we show (2) = (3). For a distribution p and € > 0, according to Lemma 
2.5, we choose a sequence of maps f,,. Al (Normalization) and A2 (Weak additiv- 
ity) imply that S(pmix,e,) = LCA (p) — 2)n/ log 2] log 2. A2 (Weak additivity) and 
(Monotonicity) imply that S(p” o f!) < S(p") < nS(p). By using these relations, 
A4 (Asymptotic continuity) implies that H(p) — € < S(p). Since € is arbitrary, we 
have H(p) < S(p). Similarly, using Lemma 2.6, we can show that H(p) > S(p). 
Thus, we obtain H(p) = S(p). 

Now, we show (3) = (1). K1 (Normalization), K2 (Continuity), and K3 (Non- 
negativity) are oblivious from the definition (2.2). K4 (Expandability) and K5 (Chain 
rule) follow from (2.4) and (2.5), respectively. a 


To show Lemmas 2.5 and 2.6, we prepare another lemma as follows. 


Lemma 2.7 (Han [26, Lemma 2.1.1.]) For any two distributions Px on X and Py 
on ¥, there exists a function f from X to Y such that 


d(P r(x), Py) < e-? + max(Px(S(a + 7)°), Py(T(@))), (2.204) 
where 


S(a) = {x € X|Px(x) se “}, T@) = {y € YIPy(y) 2 e “}. 
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Proof We define a map f from 4% to Y as follows. We number all of elements of 


T (a) as T(a) = {yj}, .--, Yn}. So, we have 
n= |T(a)| < e*. (2.205) 
For this purpose, we define n disjoint subsets f~'(y,),..., f~'(yn) as subsets of 


X. First, we choose a subset f~!(y;) C S(a +7) such that 


by Px (x) < Py(1) < >. Pra) +<¢ 


xef—!(y1) xef-'(y1) 


for any x’ € S(a+~)\f~'(1). Next, we choose a subset f~!(y2) C S(a+7)\ 
f~'() such that 


> Px(x) <PyQx) << >) Px) +e". 


xe f—!(y2) xe f-!(y2) 


We repeat this selection as long as possible. Let y, be the final element y whose 
inverse set f—!(y) can be defined in this way. 

Consider the case / = n. We reselect f~!(yn) to be (U"2) f~!(y7))°. Then, the set 
f—'() is empty for y € T(a)°. Due to Exercise 2.12, we have 


n—1 


d\(P r(x), Py) < > IP oxy) — Py Oi) | + IP eoxy(y) — Pr)! 


i=l yeT (a) 
n—-l (a) 

<pret 7+ DO Pry) 5 7+ Pr(T@), 
i=l yeT (a)* 


where (a) follows from (2.205). 
Next, we consider the case / < n. We define f~!(yj41) = ¥\(Ul_, f-'0))*- 
Then, for y € {y1,... ¥41}°, f-'O) is empty. Since 


I+1 
SPvon = Dl Px), 
i=l xeS(a+y) 
we have 
> Pros Do Px@. (2.206) 
yelyi,. dye xeES(a+y)° 


Hence, due to Exercise 2.12, we have 
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! 
d\(P yx), Py) < >» IP pox) ov) — Py Qu)| + > IP pox) ) — Pry) 


i=l yetyie vi} 
l 
<yDliet7+ DI PG) 
i=l yey... 4} 
(a) 
se7+ DS) Px(z) = e774 Px(Sa+7)°), 
xeS(at+y)e 
where (a) follows from (2.205) and (2.206). | 


Now, using Lemma 2.7, we show Lemmas 2.5 and 2.6. 


Proof of Lemma 2.5 We apply Lemma 2.7 to the case when a = (H(p) — €)n, y= 
iss and Py and Py are p” and the uniform distribution pPmix.g, on the set 2, = 
{0, 1}(4(P)—9n/l082] | respectively. Then, Py(T(a)°) = 0 and e~? — 0. Since RHS 
of (2.44) goes to zero with R < H(p), we have Py(S(a + y)°) — 0. Therefore, we 
obtain the desired argument. a 


Proof of Lemma 2.6 We apply Lemma 2.7 to the case when a + y = (H(p)+ ©)n, 
7 =1n5, and Py and Py are p” and the uniform distribution Pmix,q, on the set 2), = 
{0, 1}\(4(P)+9n/lo82] | respectively. Then, Py(S(a + y)°) = 0 and e~7? > 0. Since 
RHS of (2.42) goes to zero with R > H(p), we have Py(T(a)°) — 0. Therefore, 


we obtain the desired argument. a 
Exercises 


2.49 Show that the Rényi entropy H;,,(p) and the min entropy Hmin(p) do not 
satisfy A4 (Asymptotic continuity) for s > 0 as follows. 
(a) Define the distribution pa,. on {0, 1,...,d — 1} by 


+e ifi =0 
Pa. (i) 7= (2.207) 


é asus 
— Fei ifi > 0. 


Ale 


ii 
d 
Show that di (pa,c, Pmix,d) = €- 
(b) Show that Hmin(pa,<) = logd — log(1 + de). 
(c) Assume that de — oo as d — oo. Show that Ain Pong) Ano Pas) =1+ as =F 
O(za0a) as d > 00. 
(d) Show that My4;(Pa,.) = logd — tlog(4(1 + de)! + Ht — dey), 
(e) Assume that 4(de)'** — 00 as d > ov. Show that aT ceo =1+ 


1+s) log = 5 d 
Se + O((de) sed) as d > ov. 
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2.50 Show that the Rényi entropy Hj_;(p) and the max entropy Hmax(p) do not 
satisfy A4 (Asymptotic continuity) for s € (0, 1) as follows. 
(a) Define the distribution Pace on {0,1,...,d — 1} by 


PaO =| * Gt; 0, (2.208) 
Show that di(p/, ., Plo) = € 
(b) Show that Fen Pa lt 2) 1 fore>0. 
(c) Show that H)_;(p! .) = -t log((1 — €)'~* + (d — 1)(5)'*). 
2(1—s) 


FA +s (Pmix,a)— A145 (Pa. 1-s s_l-s 
(d) Show that == ing l4s(Pa.c) vig (4 1)eel + OG) + Oa as 
e— 0. 


2.51 Show that e~?/?) satisfies A4 (Asymptotic continuity) for s > 0 by showing 
the following inequality. That is, show that the continuity of e~/?) does not depend 
on the cardinality of the supports of p and q. 


|e 2() = e f(a) < 2d,(p,q). (2.209) 


2.6 Large Deviation on Sphere 


Next, we consider a probability distribution on the set of pure states. In quantum 
information, if we have no information on the given system H = C’, it is natural to 
assume that the probability distribution is invariant with respect to the action of the 
unitary group U(/) on the set of pure states. Such a distribution is unique and is called 
the Haar measure, which is denoted by juz,. Since the normalized vector is given as 
|¢) € C’ satisfying ||@|| = 1, the distribution jz, is given as a distribution on the set 
of pure states satisfying that 


[ was) = | Ln(dU ®) for U € UO). (2.210) 
B B 


That is, the Haar measure is defined as the unique distribution satisfying (2.210). 
When the pure state is regarded as an element of the 2/ — 1-dimensional sphere 
S”-!, the distribution jx is given as a distribution on the 2/ — 1-dimensional sphere. 
More generally, the Haar measure jis» on n-dimensional sphere S” is given as the 
distribution satisfying that 


; [gn (dx) = | Lis» (dgx) for g € O(n + 1). (2.211) 
B B 


The Haar measure has several useful properties. For example, the invariance guar- 
antees that 
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1 
[oveluas) = 7h (2.212) 


Further, when H = C’ is spanned by the basis {|e;) }! 1» for n-th permutation 7, we 
define the unitary U, on H®" as 


U,(\U1, ..-5 Un)) = |Urqy, ++ 5 Uniny)- (2.213) 


Then, we define the n-th symmetric subspace H,,, C H®" as the space spanned by 
(dg Ur (lets 05 C1, C25 0+ 5 C2y ve Ely «+ +5 &1))}. The dimension of Hs.» is ('*”7'), 
and the invariance implies that 


1 
fio 1p) (GIP" ur (dg) = Gin Fry Prt (2.214) 


I-1 


where P>,,,, is the projection to H,,,. When a pure state p on H®"” is invariant for U, 
with an arbitrary n-th permutation 7, the pure state p is a state on H, ,. Hence, we 
have 


1 
ai an |) i 10) (@1®" py (4). (2.215) 


Here, een is upper bounded by (n + 1)4¢~!. 

In quantum information, we often consider the stochastic behavior of a function 
of a pure state under the Haar measure j7. In order to discuss this issue, we need the 
following preparation. First, we define the median of a real-valued random variable 


X as 


, Med, (X) + Med, (X) 


Med,(X) © an (2.216) 
Med, (X) © inf{r|p{xlx =r} < 1/2} (2.217) 
Med,(X) = sup{r|p{xlx <r} < 1/2}. (2.218) 


The cumulative distribution function of the real-valued random variable X is defined 
as 


Fx, p(a) = p{x|x < a}, (2.219) 


where p(S2) is defined for a subset S C 92 as 


POY > dx (2.220) 


xeS 


Then, we have the following lemma. 
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Lemma 2.8 When given two real-valued random variables X and Y satisfies Fy,» < 
Fy,p, we have E,X > E,Y. 


Then, we define the metric d(x, y) between two wave functions x and y in S~—! 
as 


d(x, y) := cos! Re(x, y) € [0, z]. (2.221) 
Then, for a wave function y € S~—!, we define the subset D(y,r) as 

D(y,r) := {x € S* "d(x, y) <r}. (2.222) 
Then, the probability ~s2-1(D(y,7r)) depends only on r. For a given probability 
p € (0, 1), we define r(p) as us2-1(D(y, r(p))) = p. Fora given subset 2 Cc S7“!, 
we define the subset 2, for « > 0 as 


Q. = {x € S* d(x, y) < €, dy € Q}. (2.223) 


Then, we prepare the following fundamental lemma. 


Lemma 2.9 ((27, Theorem 2.1]) For a given p € (0, 1) and € > 0, we have 


min{p1s2-1 (2)|us2-1 (2) = p} = ws2-1(D. r(p))0); (2.224) 


where the set D(y, r(p))< is illustrated as Fig. 2.2. 


Proof We give only an intuitive proof. First, we consider an infinitesimal ¢ > 0. In 
this case, it is enough to consider the boundary of 2 because the size of boundary 


Fig. 2.2 Set D(y,r(p))< 
POY piri), 
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of £2 is proportional to |.<o9. We can intuitively find that the set D(y, r(p~)) 


has the minimum boundary among the subsets (2 satisfying f4s2-1 ($2) = p. That is, 
djigri-1 (Q) s, djtgai-1 (D(y,r(p))e) 
de = de 


dig2i—1 (Q_) 
de 


we obtain lo le=0 
Next, for p’ > p anda subset 2 satistying yi [4g2-1 ($2) = p, we define the function 
f (p', Q) as jus2-1 (2 ¢(p’,@)) = p’. Then, we have 


df (p', 2) 1 1 
dp’ ~ he es [= = Beg EE tie eat) iy J (2.225) 
which implies 
f(p', 2) < fp’, DY. r(p))). (2.226) 


Hence, we obtain 


Ms2-\(2 Fp, D(y.r(py)) 2 Ms (2 F(p’.ay) = Hs" (Dy, 1(P)) fp. DG.r(py))+ 


Using the above lemma, we obtain the following lemma. 
Lemma 2.10 ({27, Corollary 2.2]) Whena subset 2 C S*'~! satisfies psu-1(w) = 5, 
we have 


pag eter, (2.227) 


Proof Thanks to Lemma 2.9, since D(y, 5) = s it is enough to show that D(y, 5)< 
=D,5 +6) 21— e® "-) /2. The size of the boundary of D(y, 9) is proportional 
to sin”’-? 8 = cos7-2(6 — 5) for @ € [0, 7]. Hence, choosing 6’ := @ — 5, we have 


T ie cos2/-2 6'd6' 
Diy > ) = 2,228 
(y 2 75 | fa ( ) 
where 
3 1 1 rd-—5)r 21 — 3 
Ty =) cos”? 6'dé’ = B he d-5FG a = 
2 22 ro w-2 
(2.229) 
: 3 
Since IS z = 1, we have 
J —2h-1 = af Wa AT (2.230) 


which implies /27 — 2h) > /21, = V2B (3, 5) = a 
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For ¢ € [0, 5], the inequality cost < ao holds. Using the parameter u := 


/1 — 10, we have 


i D( x ) JF cos? odo! 1 Seger 098" ez 
—I 7 —_— € = —| 

4 Te JI —1 Ti 
u2 


2 
a = ae a EMIT _,2 
gee (c xin) du _ a en" du 


< grt 


v2 v2 
where the final inequality follows from Exercise 2.56. a 


A real-valued continuous function f of S7/~! can be regarded as a real-valued random 
variable on S2/—!. Then, we define the set 2 f as 


Qe = {x € S*"| F(x) < Medgx-(f)}, (2.231) 


where Medsz-1(f) is the abbreviation of the median Med). (f) under the Haar 
measure jis7-1 on S~—!, Using Lemma 2.9, we obtain the inequality 


pign- (Qype) = 1—e PE 2, (2.232) 


Now, we say that the function f is Lipschitz continuous with the Lipschitz constant 
Co with respect to the metric d in subset 2 C S~/—! when 


PN E00) 2G, Serge @. (2.233) 
d(x, y) 


In particular, when 2 = S~!—!, we simply say that the function f is Lipschitz contin- 
uous with the Lipschitz constant Co with respect to the metric d, which is assumed 
in the following. Since (2;). C {x € S*!| f(x) > Medgz-1(f) + Coe}®, (2.232) 
implies that 


ee -l) 


psa-i{x € S*!| f(x) > Medsa-1(f) + Coe} < pus ((26)S) < 
(2.234) 


Similarly, we can show that 


—e(I-1) 


iss) 


pusa-{x € S71 F(x) < Medgx-1(f) — Coe} < (2.235) 


Hence, we obtain 


_ Aap 


psifx € ST" | f(x) —Medgi(fP)|=ese O , (2.236) 
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which implies that the cumulative distribution function of the real-valued random 
_ x2-1) 

variable | f (x) — Medsz-i(f)| is less than F(x) := 1—e % .Now, we simplify 

the expectation E,,.,,_, under the Haar measure jis7-1 on S 2-1 t Egx-1. Thus, Lemma 

2.8 guarantees that 


°F 
Beil) —Medei(f)l = [ x a 
0 Xx 


20-1) , -"o C 
= [ee Vas SI, 
¢ 2VI-1 


dx 


where we used the relation in Exercise 2.55. Thus, we obtain 


C 
|Esz-1 f (X) — Medgx-1(f)| < Es2-1|f (X) — Medgx-1(f)| < 2 7 


(2.237) 


Finally, given positive numbers 6 and C), we define the sets 


—_ 21-1 Co 1 
Bsc, =[x € P| fx) = Be f(X) + 2/4 C15} 
C{x € S71] f(x) > Medsx-1(f) + C15}, 


Ose =x c SN Bens F(X) = os | < f(x) 
<E r+ 2) +e 
ia Wii 


D{x € S71 |Medsa-1(f) < f(x) < Medgz-1(f) + Cid}. 


Then, we obtain the large deviation type bound with respect to the Haar measure on 
the 2/ — 1-dimensional sphere as follows. 


Theorem 2.11 When the function f(x) has the Lipschitz constant C, on the subset 
25,c,, we have 


psa (25,c,) < e © 9/2, (2.238) 


Here, Co is the Lipschitz constant for the whole set, and C, is the Lipschitz constant 
for the specific subset 925,c,. 

Next, we apply the Haar measure to construct a proper subset of S~/—!. A subset 
2 of S?'—! is called an € net of $7/-' when for any element x € S7/—!, there exists 
an element y € S~~! such that d(x, y) < «. 


Lemma 2.11 There exists an € net 2 of S*~! whose cardinality is less than 


we < f/(2l - Day. 
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Proof We choose a subset 2 of S~!—! satisfying the condition that d(x, y) > € for 
any two distinct elements x, y € 82. We choose the subset {2 so that no subset 2’ 
strictly larger than 92 satisfies the required condition. Here, a set @’ is called strictly 
larger than 2 when §2’ contains $2 and there is at least an element of §2’ that is not 
included in (2. A rigorous proof of the existence of such a subset can be given by 
using Zorn’s lemma. 

Hence, for any element x € S”—!, there exists an element y € $7! such that 
d(x, y) < «. That is, the set 92 is an € net of S7-!_ Due to the construction, 
D(x, €/2) N Dy, €/2) = Y for any two distinct elements x, y € 2. Thus, |Q|[g2-1 
(D(x, €/2)) = Deg Us2-1 (D(x, €/2)) < 1. Thatis, |Q| < aaitDGa/ay: Phe prob- 
ability f4s2-1 (D(x, €/2)) is evaluated by using Exercise 2.57 as 


/2 /sin 5 =e 
0) d0/h-4 
€ 


€/2 € 
[s2-1 (D(x, €/2)) = | sin”? 6d0/T)_1 > fi 
0 


0 /2 
+ 21-2 € 
apr! 2 (97-1 (21 — Yh i? = os a5 Se '/(21 — Wh 
_ gsin?s si n?-1 ¢ 
(26-1) ~ Jara 
where the relation 5 > sin 5 is used. | 
Exercises 


2.52 Show that |||x)(x| — |y)(ylll; < 2sine when d(x, y)=e< 5 and x,ye 
sel 


2.53 Show that |||x)(x| — |y)(ylll2 < V2d(, y). 
2.54 Show that |||) — |y)|| < 2sin “42 < d(x, y). 


2.55 Show that f>° 2cx?e~"' dx = 4 


pee eae 
2.56 Show “= « e-°"-) /2 when u > Oande > 0. 


v2 


2.57 Show that 
1 1 
(21—1)B (: at ;) < J/(2l -— 1) (2.239) 


by following the steps below. 


(a) Show the equation B(/ — 5, = = The i ae 


(b) Show the inequality >a ‘ log 2 54 [25 ! tog(2/ — 1). 
(c) Show the inequality (2.239). 
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2.7 Related Books 


In this chapter, we treat several important topics in information science from the prob- 
abilistic viewpoint. In Sect. 2.1, information quantities e.g., entropy, relative entropy, 
mutual information, Rényi entropy, and conditional Rényi entropy are discussed. Its 
discussion and its historical notes except for Rényi entropy and Conditional Rényi 
entropy appear in Chap. 2 of Cover and Thomas [28]. Conditional Rényi entropy 
is recently introduced and discussed by several papers [29-31] from various view- 
points. This quantity will be investigated much more deeply in future. 

Section 2.2 focuses on information geometry. Amari and Nagaoka [2] is a textbook 
on this topic written by the pioneers in the field. Bregman divergence plays a central 
role in this section. Although their book [2] contains the Bregman divergence, it 
discusses information geometry from a more general viewpoint. Recent Amari’s 
paper [6] focuses on the Bregman divergence and derives several important theorems 
only from the structure of Bregman divergence. This section follows his derivation. 

Section 2.3 briefly treats the estimation theory of probability distribution families. 
Lehmann and Casella [32] is a good textbook covering all of estimation theory. For 
a more in-depth discussion of its asymptotic aspect, see van der Vaart [7]. 

Section 2.4.1 reviews the type method. It has been formulated by Csiszar and 
KG6ner [8]. Section 2.4.2 treats the large deviation theory including estimation theory. 
Its details are given in Dembo and Zeitouni [33] and Bucklew [34]. In this book, we 
give a proof of Cramér’s theorem and the Gartner—Ellis theorem. In fact, (2.163), 
(2.165), (2.169), and (2.171) follow from Markov’s inequality. However, its opposite 
parts are not simple. Many papers and books give their proof. In this book, we prove 
these inequalities by combining the estimation of the exponential theory and the 
Legendre transform. This proof seems to be the simplest of known proofs. 

Section 2.5 explains how to derive the entropy from natural axioms. This section 
addresses two sets of axioms. One is close to the axioms proposed by Khinchin [25]. 
The other is related to asymptotic continuity, and has not been given in anywhere. 
The latter is related to the entropy measure discussed in Sect. 8.7. 

Section 2.6 focuses on the Haar measure, which is a natural distribution on the 
set of pure states. Milman and Schechtman [27] discusses the asymptotic behavior 
of a function of the random variable subject to the Haar measure. Since this type 
discussion attracts much attention in quantum information recently and is applied in 
Sects. 8.13, 2.6 is devoted to this topic. 


2.8 Solutions of Exercises 


Exercise 2.1 When y= f(x), Pxy(x,y) = Px(x). Hence, H(X, f(X)) = 
— De yyepey Pex, y)log Pxy (x, y) = — Dy Px (a) log Px (x) = H(X). 


Exercise 2.2 Consider the case Py(1) = A, Py(0) = 1 — A, Pxjyai = p, Pxyyao = 


Pp. 
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Exercise 2.3 The concavity of entropy guarantees that the maximum of H (p) under 


the above condition is realized by the distribution (a, 4, pores 4), whose entropy 


is h(a) + (1 — a) log(k — 1). 


Exercise 2.4 AH(pa X pa) = — dia,.0, PAWA) PB (wa) log(pa(wa) paws) = 
— i, PaWa) log pawa) — >, Paw) log pp(we) = H(pa) + A (ps). 


Exercise 2.5 D(pa x pallda X 9B) = Dowywy PAWA) PB (We) (log(pa (wa) PB (we)) 
— log(ga(wa)qawe))) = Doin Pa(wa) (log pa(wa) — log ga(wa)) 
+ >). Paws) (log pa (we) — log gaws)) = D(pallga) + D(pallqs). 


Exercise 2.6 Define f(x) := log x — (x — 1). Since f’(x) = + — 1, we find that the 
maximum of f(x) is attained only when x = 1. That is, f(x) < fl) =0. 


Exercise 2.7 Apply a stochastic transition matrix of rank 1 to Theorem 2.1. 
: 2 
Exercise 2.8 D-(pllq) =D, pi (1- /£) =1->, /ra=1d;(VR- Va). 


Exercise 2.9 Use the fact that >’; >); Q'|Di —@le Dl 2a O' (pi —qi)l. 
Exercise 2.10 Consider the x > y and x < y cases separately. 


Exercise 2.11 

(a) Use | pi — gil = |./Pi — JGill./Pi + JSail- 

(b) Use pj + gi = 2./Pi/Gi- 

Exercise 2.12 We find that p,, — dy = — pn (Px — qx). Thus, | Px. — Gxo| < 


aren | Px = ax\- Hence, di (p, q) = $|Pxy ~ xo | + 5 ed, | Px =~ axl < Date 
|Px — Gxl- 


Exercise 2.13 Assume that the datum i generates with the probability distribution p;. 
Apply Jensen’s inequality to the random variable ./q;/p; and the convex function 
—logx. 


Exercise 2.14 


(a) Since Schwartz inequality implies that ||x||||y|| => (x, y) and ||x|I||yl]l = (y, x), 
we have 


(ell + yD — (lel? +, y) + (y, x) + IlyIl?) 
=2|[xIIIlyll — (, y) — (y, x) 2 0. 


(b) 
(lll + Hy)? = lec? + Ge, y) + Gy x) + My? = Ix + yl’. 


(c) Substitute ./p; — ,/r; and ,/r; — ./qi into x and y in the inequality given in (b). 
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>; Pg dog qi — log pi) 


Exercise 2.15 Check that 4’(s|p||q) = S93 
Dai Pi 


Exercise 2.16 Check that $”(s|p|lq) = 


(> Pp) °9) CO; Pg dog qi — log pi)? = Pp; *q} (log q; — log p;))* 
(> P) °@}) 


Next, use Schwarz’s inequality between two vectors | and (— log p; + log q;). 


Exercise 2.17 For 0 < s < s’, we have 5 f(s’) = (1— 5) f) + Sf (s') = f(a - 
5)-0+ 4.5’) = f(s), which implies that 452 > £©. Similarly, for 0 > s > s’, 


we have £ w <i ce . Thus, 4 8) is monotone increasing When f(s) is strictly convex 


for s, the above inequalities < and > can be replaced by < and >. Hence, its is 
strictly monotone increasing 

Exercise 2.18 

(a) For simplicity, we denote max(b;,..., by) by by. We choose a subset S C 


{1,...,k}suchthatby = b; fori €¢ Sandby > b; fori ¢ S. Thus, t log(S a;b}) 
= logby + Flog(ics ai + Digs aiGey') > logbu +; 
log(>’;<5 di) > log by ast > oo. 


Exercise 2.19 Di Pi ag = bar :pi>0 Pi "qi => pape qi ass —> 1. 


Exercise 2.20 Solving the equation that the partial derivative equals zero on the RHS. 
Then, we obtain \; = p;/q;. Substituting it into th RHS, we obtain the LHS. 


Exercise 2.21 Apply the formula (2.32) to the conditional distribution Pyyzjyau. 
Then, we have 


IOAYZ =) 1 ZU HO) + > Pee y|Z $2.0 =a). 
; (2.240) 


Taking the expectation for U, we obtain (2.33). 


Exercise 2.22 e174?) =, pa(a)'* pa(b)'* = d, pala)! >, pb)! 
= eV(lPade¥(slpa), 


Exercise 2.23 


(b) 


1 
D(q\|p) - 7a, Pallps) 


1 
= D210) (log q(x) — log p(x) — —— P74 (x) log g(a) 
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+ Daeg re) 5 it a 
=- Toy Daosaey + 7 — 
= ve) = ve) 
a a = Ps)+5 
=- a ~ s)log pl) -— ys) + 


SF > ps(x) log p(x) + Ws) = D(ps||p). 


(c) The desired inequality follows from the inequality ~ D(q lps) = Ofors <1. 


Exercise 2.24 


(a) It follows from ~(s) > H(p) for s € [0, 1]. 

(b) The left hand side is zero when s = 0. 

(©) W(s) =—D, ps2) log p(x), H(ps) = —U— 5), psx) log p(x) + Ws), 
and D(ps||p) = >, Ps(x) log p(x) — Ws). 

(e) It follows from the relations 4 H (ps) < Oand H(p)) = H(p) < R. 

(f) It follows from Exercise 2.23. 

(g) It follows from (f) and the continuity of H(q) and D(q||p) for q. 

(h) Since (sx) = (H (Ps) — W(se))/(1 — Se) = (R — Use))/ — 5p); 


we Pee D(Ds,ll P) = Srv’ (Sr) — Vr) = Sx(R — Yr))/C — 5k) — Vr) 
= SRAT SR) a = 


(j) When s = sp, Be 0. Further, since 4(R + (s — Dy'(s) — u(s)) 


= (s — 1)w)"(s) > 0, ST > 0 fors > sp and ST < 0 for 
S < Sp. Hence, the maximum a shee) 
(k) Combine (g), (h), and (j). 


can be realized with s = sp. 


Exercise 2.25 


(a) See (e) of Exercise 2.24. 

(b) It follows from Exercise 2.23. 
(c) See (g) of Exercise 2.24. 

(d) See (h) of Exercise 2.24. 

(e) See (j) of Exercise 2.24. 

(f) Combine (c), (d), and (e). 


—Ss —s) 


attained with s > —oo. 


Exercise 2.26 — Us) _ (s- pes (vis) = 7 “Eat <0. Hence, the supremum is 
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Exercise 2.27 Since — log max; p; < Ha(p) = Hmin(p) < Ha(p) < Hmax(p), it 


is enough to show Hmax(p) < — log min; p;. This inequality is equivalent with 


min; Pi = TqpSOT- 


Exercise 2.28 Equation (2.72) can be shown by a simple calculation. Equation (2.73) 
is shown by the following way. 


log |4| — ng D(Pxy|l Pmix.x x Qy) = H(X|Y) — mn D(Py||Qy). = H(X|Y). 
Y yy: 


Exercise 2.29 Due to (2.74), we have 
: d l+s 
lim Hiss(XIY) =~ D0) DPrvray() ls-0 = H(X|Y). 


Due to (2.74), we have 


lim Hi}, ,(X1Y) = mex — FDP y)'* Oy(y)Ss= 


= max — D/Px.y(x, y)(logPx.y(x, y) — log Ov(y)) = A(XIY). 


Exercise 2.30 The second expression in (2.74) yields (2.83) and (2.85). (2.81) yields 
(2.84) and (2.86). 


Exercise 2.31 The concavity of s > s H\+4;(X|Y) can be shown from the convexity of 
st> Di45(p||q)(Exercise 2.16). Since the function s  Dj+5(Pxy|l Pmix.v X Qy) 
is convex, the function s +> ming, Di+5(Pxy || Pmix,x X Qy) is also convex. Hence, 


the function s > sH} (X|Y) is concave. Similar to Exercise 2.17, we can show 


l+s 
that the functions s +> Ay+;(X|Y) and H, ce |Y) are monotonicallly decreasing. 


Exercise 2.32 Due to the equality condition of Hélder inequality, the equality in (2.88) 
holds if and only if there exists a function c(y) such that Pyjy—y(x) = c(y)Pxy(x, y), 
which implies that Pyy(x, y)~*/"- = c(y)Py(y). Hence, we obtain Pxy (x,y) = 
c(y)~U-9/8 Py (y)~C—-)/5, This condition is equivalent to Pyy (x, y) = al + Py(y). 


Exercise 2.33 We denote the marginal distributions of X and Y py and py respec- 

tively. Then, Cov,(X,Y) = = y BQ, y(K— E,X)(¥ —E,Y) = = Dery Px) 

Exercise 2.34 For i # j, we have 3), Pali): ++ po(wn ) Atos pole) sles poy) = 
dlog pj Wi, Ln lsd 

Xn)( 7g) 


ey Wn Pp(1, ear 
a : etal n 
= eae Wn Po(w1) ae Pon \( 08 po(w1) 7 og pow dy2 


ern 


dl dl n 
= Duvrpenine POW) + Pon) (PEED fo + SOEGA? 


anaes 
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iia 


ma pam ae w, PoW1) +++ PolWn) ye (Peezeen y2 
= = die 1 le Pow; )( Pere) 2 = nJo. 


Exercise 2.35 Use the approximation 


V Po+e(W) = VPaw@y/ 1 + lywyet 5 Te Bala e. 


Exercise 2.36 


dl dl i) d log py(w;) 
=>. wo, Pow) *- - po(Wn) Sor st ee ae os Poli) 8 Pa Wj 


(a) It follows from the Taylor expansion of pp.-(w) for e. 
(b) Since “ ERO = 4 pol) J yy (w) — (422 J pg(w))?, we have 
2 po(w) LEP 2 =O, po(w)(— cnt / po(w))? + £2 / pyw)) 
== ala) tee /po(w))? = —Jg. Thus, D(poll poe) 
= >, po(w) (log pe(w) — log pore(w)) = — YX, pow) (eee + 
1d 10g Pole) e2) 
= 05 polw) & H og pale) 2 
3 po(w) EPO) ¢ — 5 >, po(w) Ee 
=-5 >>, Po n(w) po 2 = = Jy? : 
(d) D(po+ell po) = >, Po+e(w) log po+e(w) — log po(w)) 
~ =>, (po(w) + aro) e 4 I 1 ad’ pow) 2) (SPER) +} 1 d? log pow) a) 


|| I 


Lo d20 

~ d\ 1 d71 

=>, pow) (a og ple) ey 1 eee) Oy doy) Heep ¢ 
= =>, pow)4 ad Plogpt 2)+>d.,, dente) dog Pol) ¢ = = = Ipe _ 1 ye = 1 Ie, 
Exercise 2.37 e?!Pellpor) = pie po(w)!-s Pote(w)® 
=D, po(w)!*(po(w) + See + 1 Lpelw) ¢2ys 
= E, pol “py lay" + Mike py(ia)he + 4 LB pgs)! 
~ d 
= >. Pew) + s( APO) Ho (w)e +3 a? Pate) ng (w)~ 1 €) a s(s— a 


po(w)'e)”) 
= 14D, po(w)s( 2 pow) te + Y, pow) LA pow) 2 
A. se > pole) ( L242)? pple) 2 2 
a OEY, Pow) (SP PE = 14+ SPP Jy. Thus, 9(s| poll Pore) = log 
d+ + 28 i) Jy ) ~ see 1) Jp. 


Exercise 2.38 For arbitrary 7 and 1’ , and a real number  € (0, 1), we choose 4p 
such that maxg >°, (An + (1 — ry OX — p0) = > ,An + A - dyno — (6). 
Hence, 
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v(An + (= A!) = DLOAm + (1 — ANH - 1) 
k 


=) >" m5 — uO) + A=) >) 5 — 1) 
k k 


<A max >) m6" — 4(6) + (1 — A) max D7 m6" — WB) = Av(m) + 1 — Av). 
oe Se 


Exercise 2.39 Choose the generator — log p(x). Then, the set {p,(x)} is an exponen- 
tial family generated by — log p(x). The set {¢|H(q) = H(ps)} is a mixture family 
generated by — log p(x). So, Theorem 2.3 directly solves Exercise 2.23. 


Exercise 2.40 


(a) Since 7(@) = Due Po(w)X ), is an unbiased estimator. 
(b) Since # log pow) = 92-5 log pow) = (G4)! G log pow) = (Jo) 
log po(w), the Fisher information for 7 is Jo(Jo)~ 2 = =Jy im Then, the lower bound 
of the variance of unbiased estimator given by Cramer-Rao inequality is Jp. The 
variance oY X is also Jo. 
(c) Use # = an 
(d) Since 4 “= 7, we have . = er = Jn. Taking the integral, we obtain the 
desired equation, 
(e) Inequality (2.140) is derived by Schwartz inequality. Since |(X — 7, ly) p,| = 1, 
the equality condition is 2 = X—7. 
(f) Replace J, by 4 2a. ae Pn: We obtain = — = In(X — 1) Pn: 
(g) Define 0 := f oe dy, and (6a) “= = fn Jy an. 
au) = = 4) a ~ nn 8y-1 = Jada =7. The function log ¥.. 
DP, (wel w) ‘ie taacnee the same differential equation. Due to the uniqueness of 
the solution of the differential equation, we have 4((7)) = log ©, p,(wye*™. 
Since S18 Pa = Bt he = J,(X —) = J,X —nJ,, we have log p, = 6 


n dn 
X — (0(n)). Hence, we have p, = e?*-#, 


Exercise 2.41 Show that Pare) = Oif and only if 7(0) = X(w). 


Exercise 2.42 Combine (2.13) and (2.105). 


Exercise 2.43 The case of n > m can be obtained from n,n —1,...,m+1>m. 
The n < m case may be obtained from — u ie ee < a 
m? m—1 n+l n 


Exercise 2.44 E, X = >; pixi > paneer DixXi = ee Di: 
Exercise 2.45 Apply Cramér’s theorem to the random variable log p;. 


Exercise 2.46 Equation (2.189) implies that 


2.8 Solutions of Exercises 91 
e 1 Ceyn i nR 
lim —— log P*(p", e””) 
n>o Nn 


d ,nR 
< lim ening max mee ee ) 
senk 


n>o Nn get 2 en H(q) 


(n+ 

= Pa D(p\lq). 
Combing (2.55), we obtain the < part of (2.188). 
Exercise 2.47 


1 1 et Pallp) 
lim —— log P(p", e"*) < lim —-log max © ————— 
no Nn n>0o NN ~~ qeT,:H(q)<R (n + 1)4 


= D 
, ee (p\lq). 


Combing (2.65), we obtain the < part of (2.188). 


Exercise 2.48 Since p,(O)e"?“ + p,(le"~? = e-"e"™ + (1 — ee", we 
have (6) = limy-+o0 1 log(e“"en4 + (1 —e-")e"%) = —6b for 0 < ao and 


uO) = —a(1 — 0) for 6= ae Hence, we obtain y(@) = — min{(1 — @)a, 0d}. 
Since —b <x <a, we have maxg.o(x8 — w(A)) = max (max -«, 9>0 («8 + Ob), 
maxg> +, (x6 +a(1—@))) =max((x + b)45,a+@-a))= max (22+) 


+b’ 
a(x+b) )= wn 
a+b 


On the ne hand, since a > x > —b, limy-soo + | log Pr{Xn = X} = limn+0 1 
log p,(0) = limnsoo 2 a log e~ 


<d. 


nd —q, 


Exercise 2.49 


(b) Since e~Mnin(Pa.) = + + €, we have Hmin(Pa,.) = logd — log(1 + de). 

(c) Since Amin (Pmix, a) _ Amin (Pa, j= = log( + de) = (log d a log €) + O(z)s we 

have oi Pr Ain Ps é) =1+ oe + O Gea) 

(d) Since e~S¥i+s(Pad) = (5 4+ —)4§ + d — 1)(5 -— 5&)'", we have Ai4,(pa,e) 
—flog((g +6 +d -— DG — z)'") = logd — } log(5(1 + de) 
d-l() _ de _ylt+s 

ql gal) 

(e) Since Hy4s(Pmix,d) — Hi+s(Pac) = + log(Z(1 + de) + Sa — ys) + 

O(d(de)-"*) = # log(5(1 + de)'**) + O(d(de)—"*) = — “84 + 8 log(de) + 

O(d(de)~"*) = = logd+ is log e+ O(d(de)~“*), we have ae eee 


=14 ee + O((de)~ (9 


Exercise 2.50 


(b) Since the cardinality of pi, is d, we have Amax(p),.) =logd. Thus, 


Anax (Phix.a)~ Amax(P)y 9) 


jogd =1fore>0. 
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(c) Since ea = (1 — 6)! + (d -1)(g)'* = 1-1 -s)e+ d - 1 
el + O() = 1— (d— 1)%e!* + O(6), wehave Hi_s(p’, .) = —+ log((1 — 6)" 
+ (d -1)(5)'*). 

(d) Since (1—©)'* + d—1)(5)'* =1-(—s)e+ @—- Die! + O() 
=1-(d—1)%e!*+ O(€, we have Hy_,(p',.) = —+log(1 — (d — 1)%e!-$ + 


0) =*2d=-D%e + Of) + OE), Thus, HMA —AePaw S tae 


Ss) 


(d — 1)'e!* + OGG) + O(G >) ase > 0. 


Exercise 2.51 We have 


Je 2) - e124) = Je 2) —~2c4+ dc? — eT O 4 2¢~ dc’| 


=| Do cP = Qo) |= |i — gi)(pi + Gi — 2c)| 


<>) lpi - Gillpi + qi — 2cl < ea ~ |) msi + qi — 2c| 


U L 


=2d\(p, q) max [pi + qi — 2c. 


Since min, max; |p; + gj — 2c| < 1, we obtain |e~2) — e~2@| < 2d) (p,q), 
which implies (2.209). 


Exercise 2.52 It is enough to show that |||) (x| — |y)(y| ||, = 2 sine when |(x|y)| = 


‘ ; 10 : : 
cos €«. When the state |x)(x| is written as oo) the other state | y)(y| is written as 


( cos*6 cos@sin@ 


: : . Hence 
cos@sin@ sin? @ ) 2 


cos? — 1 cos@sin0 
cos@sin@ sin? 6 


Ix)(x] — ly) =( 


Solving the characteristic equation, we obtain the eigenvalues + sin €. Thus, we have 
IIx) x] — ly)(yllh = 2 sine. 


Exercise 2.53 It is enough to show the same case as Exercise 2.52. Since the eigen- 


values of |x) (x| — |y)(y| are +sine, we have |||x)(x|— |y)(yl|l2 = V2 sin’ € = 
2 sine. 


Exercise 2.54 It is enough to show the same case as Exercise 2.52. Choose € as 
d(x, y) =e. Then, |x) — |y) = 4 7 —) Thus |[|x) — |y)|I2 = (1 — cose)? + 


sin € 
sin* € = 2(1 — cose) = 4 sin? 5- The second inequality follows from sin 5 < 


Nia 


Exercise 2.55 Use the relation i e-Tdx = JE. 


2.8 Solutions of Exercises 93 
Exercise 2.56 Since u, € > 0, we have (u + €,/1 — 1)? > u? + (e/1 — 1)?. Thus 


lata fine 1 en" du 7 few ie ew du 7 fe “eVIHT 6 utev=I gy 


v2 v2 v2 
tJ i—-1-e/l=1 oo 2 
; ““d 
o-ed-)) i : d 2 20-1) Jo ll u 
v2 v2 
el 1 el lc (I D2. 
v2 
Exercise 2.57 
k-1-$ 11 1 oly _ 
(a) Use B(k — 5, = ear BH 1 = 5,5) and BG, 5) = 7. 
(b) Since eo +x) is concave, we have log(1 + = = J ee = x). Thus ar 
k1/2 
log x5 i= =) log(1 + Ik a= = + ie  log(1 ae = Pe a 3 i= =) log ms = 7 
log 12 12 _ = eee 1). 


1/2 
(©) Due to (b), we have >-/_) log 2k} < —+5 log(2/ — 1). Thus (2/ — 1)B(l— 


Ll) < (21 — a(t — 1)? = J — Da. 
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Chapter 3 

Quantum Hypothesis Testing 
and Discrimination of Quantum 
States 


Abstract Various types of information processing occur in quantum systems. The 
most fundamental processes are state discrimination and hypothesis testing. These 
problems often form the basis for an analysis of other types of quantum informa- 
tion processes. The difficulties associated with the noncommutativity of quantum 
mechanics appear in the most evident way among these problems. Therefore, we 
examine state discrimination and hypothesis testing before examining other types of 
information processing in quantum systems in this text. In two-state discrimination, 
we discriminate between two unknown candidate states by performing a measure- 
ment and examining the measurement data. Note that in this case, the two hypotheses 
for the unknown state are treated symmetrically. In contrast, if the two hypotheses 
are treated asymmetrically, the process is called hypothesis testing rather than state 
discrimination. Hypothesis testing is not only interesting in itself but is also relevant 
to other topics in quantum information theory. In particular, the quantum version of 
Stein’s lemma, which is the central topic of this chapter, is closely related to quantum 
channel coding discussed in Chap. 4. Moreover, Stein’s lemma is also connected to 
the distillation of maximally entangled states, as discussed in Sect. 8.5, in addition 
to other topics discussed in Chap.9. The importance of Stein’s lemma may not be 
apparent at first sight since it considers the tensor product states of identical states, 
which rarely appear in real communications. However, the asymptotic analysis for 
these tensor product states provides the key to the analysis of asymptotic problems 
in quantum communications. For these reasons, this topic is discussed in an earlier 
chapter in this text. 


3.1 Information Quantities in Quantum Systems 
3.1.1 Quantum Entropic Information Quantities 


For the preparation, we discuss the quantum extension of information quantities 
given in Sect.2.1. Let us first consider the von Neumann entropy of the density 
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matrix p with the spectral decomposition p = x7, pi|u')(u'| as its quantum exten- 
sion of the entropy.! The von Neumann entropy is defined as the entropy of the 
probability distribution p = {p;} of the eigenvalues of the density p, and it is 
denoted by H(p). Applying the arguments of Sect. 1.5 to f(x) = log(x), we have 


log p & > 7 dog p;)|u!)(u'|, and we can write H(p) as 


H(p) = —Trplogp. (3.1) 


The von Neumann entropy also satisfies the concavity, as proved in Sect.5.5. Simi- 


larly, the Rényi entropy is defined as Hy_,(p) = “8 with W(s|p) © log Tr p!~. 


Henceforth, we will use its abbreviation 7(s) as mentioned previously. The mini- 


mum and maximum entropies are defined as Hmin(p) we log ||p|| and Amax(p) 2 


log Tr{p > O}, and satisfy the similar relations as (2.40). 

Since the diagonal elements of a diagonal matrix forms a probability distribution, 
we can therefore interpret the tensor product p®” as the quantum-mechanical analog 
of the independent and identical distribution. In other words, the eigenvalues of p®” 
are equal to the n-i.i.d. of the probability distribution resulting from the eigenvalues 
of p. Since {p’a~* > 1}(p’a~* — I) => O for s > 0, the inequalities 


SS 


{op > a}= {pa > 1} < {p’'a™ > I}p'a™ < p'a” (3.2) 
hold. Similarly, 
{p<a}<p a’. 


Hence, we obtain 


Its q7s 


Trp{p > a}<Trp : (3,3) 


Tr p{p <a} <Trp' a’, (3.4) 


fora > Oand0 < s. Treating the independent and identical distribution in a manner 
similar to (2.42) and (2.44), we obtain 


Tr oo {pe < ent < el Minos (b(s)—s R) (3.5) 


Tr as {ae > en] <e" mins <0 ((s)—s R) : (3.6) 
Certainly, the relationship similar to the classical system holds concerning sigh sk 
and H(p). 
As an extension of the relative entropy, we define the quantum relative entropy 
D(p\||c) for two density matrices p and o as 


‘Historically, the von Neumann entropy for a density matrix p was first defined by von Neumann 
[1]. Following this definition, Shannon [2] defined the entropy for a probability distribution. 
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def 
D(pllo) = Tr p(log p — logo). (3.7) 


The quantum relative entropy satisfies an inequality similar to (2.13), which will be 


discussed in greater detail in Sect.5.4. For the two quantum states p and a, we can 


also define the function 0(s|p||c) a log(Tr p!~*°a*) and obtain 


Exe. 3.4 


P Olpllc) = —D(pllo), ¢Upllo) = Dollp). (3.8) 


When it is not necessary to explicitly specify p and o, we will abbreviate this value to 
0(s). If p commutes with o, the quantity o(s|p||c) is equal to the quantity o(s|P|| Q) 
with the probability distributions P and Q that consist of eigenvalues. 

Since ¢(s|p||7) is a convex function of s***°, a quantum extension of relative 
Rényi entropy 


det O(—slpllo) _ 1 


Diss(pllo) = ———— = — log Tr p!*o™* (3.9) 


is monotone increasing for s. Hence, we define the minimum and the maximum 
relative entropies as [3] 


def oil oa def 
Dymax (plo) = log lo *po ? Il, Dyin(Pllo) = — log Tr o{p > O}. (3.10) 
Hence, we obtain the relations 
lim D1-s(plla) = Dnin(pllo), lim D-s(pllo) = D(plla). (3.11) 


Also, we can show the inequality®**”° 


D,_s(pl|o) < Dymax(p||o) (3.12) 
for s € [—1, 1). Due to the non-commutativity, we define another function ¢(s|pllo) des 
log Tr(a 20-9 po 20-5 1-5 and another quantum extension of relative Rényi entropy 
(sandwiched relative Rényi entropy) [4, 5]: 


et O(— 1 : ‘ 
Dy ,a(olo) PCAN) _ I og tr(o- Hts pom), (3.13) 
S S 


Exe. 3.2 


These relative entropies satisfy the additivity 
obtain 


. By a simple calculation™**°, we 


¢ Olpllc) = —D(pllo), (3.14) 


which implies 
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lim D,,,(pllo) = Diplo). (3.15) 
Also, by a calculation’ *”**’, we obtain 
tim Dy,,(plla) = Dax olla). (3.16) 


Further, as shown in Sect. 3.8, we have 


Di4s (Ken (pe) I o®") 


n 


(3.17) 


D,,,(plla) = tim, 


sl ken (p20 
$(=5| Ken (p®")\|o2") fors > 


for s > —1, which is equivalent to d(—s|pllo) = liMp+oo : 


=1. 


Lemma 3.1 The functions s +> 6(s|p||o) and As|pllo) are convex for s € [-1, «). 
The functions s +> D)45(p||o) and D,, ,(p||o) are monotone increasing with respect 
tos € [—1, &). 

Proof Since the limit of convex function is also convex, Relation (3.17) implies that 
the function s ds|pllo) is convex. The convexity of s +> ¢(s|p||o) is shown in 
Exercise 3.5. Using these two facts, we can show that the functions s > Dj4;(p\|o) 
and D,, ,(p||o) are monotone increasing with respect to s € [—1, oo). 


The above information quantities satisfy the monotonicity with respect to the 
measurement M as follows. 


Dolla) = DPMIPS), (3.18) 
G(slpllo) < P(s|PyIIPz') forO <s <1, (3.19) 
H(slpllo) = O(s|PIIIPS') for s <0, (3.20) 
H(slpllo) < O(sIPMIPM) for0 <5 <5. @.21) 
Bs|pllo) = d(s|PMIP3) for s <0. (3.22) 


Proofs of (3.18), (3.20), and (3.22) are given in Sect. 3.8. Inequality (3.19) is shown in 
Sect. A.4 as a more general argument (5.52). However, we omit the proof of (3.21). 
For the proof, see [6]. In contrast to b(p, 7) and d,(p, o), although there exists a 
POVM M satisfying the equalities in (3.18) and (3.22) only when p and o commute, 
as shown in Theorem 3.6 and Exercise 3.62, there exists a sequence of POVMs 
attaining the equalities in (3.18) and (3.22) in an asymptotic sense, as mentioned in 
Exercise 5.44 and Exercise 3.62, respectively. However, the equality in (3.20) fors < 
—1 does not necessarily hold even in an asymptotic sense, as verified from Exercises 
3.58 and 5.22. The inequalities (3.19), (3.20), (3.21), and (3.22) are rewritten as the 
monotonicity of the quantum relative Rényi entropy with measurement 


?Here, the monotonicity concerns only the state evolution, not parameter s. 
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Di-s(pllo) = Di-s(PM Py’) for s < 1 (3.23) 
Dy_.(pllo) = Dy-s(PM IPN) fors < 5. (3.24) 

Further, combining (3.23), (3.17), and their additivity, we have 
Di4s(pllo) = Dy,.(pllo) (3.25) 


for s > —1. For another proof of (3.25), see Exercise 3.8. 
Exercises 


3.1 Show that the information quantities D(p||q) and D,,;(p||q) between q and p 
are equal to their quantum versions D(p||c) and Di+;5(p||o) for commuting p and a 
with diagonalizations p = >°, pi|ui)(ui| and o = >); gilui) (ui. 


3.2 Choose density matrices p4, 74 in H, and density matrices pg, 7g inHg Show 
that 


H (pa) + H(pz) = H(pa ® pp), (3.26) 
D(pal\loa) + D(palloz) = D(pa ® palloa ® op), (3.27) 
D,_-s(palloa) + D(pa\loz) = Di-s(pa ® palloa @ op), (3.28) 
D,_,(palloa) + D(palloz) = Dy_,(pa @ palloa @ op), (3.29) 
Dmax(pa\loa) + D(ps|loa) = Dmax(Pa ® palloa ®@ oz), (3.30) 
Dyin(palloa) + D(palloz) = Dmin(Pa ® pslloa @ op). (3.31) 

3.3 Show that 
Tr pf (X) => f(Tr px) (3.32) 


for a convex function f, a Hermitian matrix X, and a state p. This is a quantum 
version of Jensen’s inequality. 


3.4 Show (3.8) using Exercise 1.4. 


3.5 Show that ¢(s|p||c) is convex by following the steps below [7]. 
T l-s —s ] —] 
ay Show ihate Gini Se ee 


by using Exercise 1.4. 
Tr p!-o8 


(logo —logp) _ 
ds ~ 


l-s o 


Trp 


(b) Show that a Trp!~* (log o — log p)o* (log o — log p). 


Tr p' * (log a — log p)o* (log o — log p) 


(c) Show that 4" (s|pllo) = ae 
rp so 


(Tr p!~*o* (log o — log p))? 


(Tr 1—s gs)2 
(d) Show the convexity of @(s|p||o) using Schwarz’s inequality. 
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3.6 Show (3.14). 


3.7 Show (3.16). 


3.8 Show (3. 25) by using Araki-Lieb- -Thirring inequalities [8, 9] Tr B2A'B 
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r 


B2 


Tr(B? A’ B?)" with r € (0, 1). and Tr B5 A" B? > Tr(B? A" B?)’ with r > 1. 


3.9 Define the state p, := e~”!” p'~* and assume that a state o satisfies H(c) = 


H(p;). Show that D(ps||o) < D(o|lp) for s < 1 by ep. see steps below. 


(a) Show that ;4 TP Colles) = 7K Tro logo — Tro log p— 


(b) Show D(olla) - 


| Doll.) = = D(ps||p). 


(c) Show the desired a fay. 


3.10 Show the quantum extension of (2.55): 


sR—W(Gs|p) _ 
oe 


following the steps below. 
(a) Define sz as the same way as Exercise 2.24. Show that 


(b) Show that 


(c) Show that (3.33). 


3.11 Show that 


(a) Show that 


(b) Show that 


(c) Show (3.36). 


sR — W(slp) 
sup ————_—— = 


so I-s o:H(a)<R 


D D z 
. min, (a||p) = D(psgllp) 


D D 
. pu (alle) = D(psellp) 


e D _pD . 
ye (cllp) (Psp |p) 


D D 
“ a (alle) = D(pse lp) 


= min ae 
a alas Pete (cllp) 


= min D(ollp). 


(3.33) 


(3.34) 


(3.35) 


(3.36) 


(3.37) 


(3.38) 
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3.12 Show that D,,,(p||o) has the following expressions. 


1 1 8, od 5 
Dy 45(pllo) = ~ log Tr(p? oF p?)'* (3.39) 
l+s 


1 at 4 l+s 5 ~~, 
log |p? ™ p? |lt4s = log ||o" 749 po” 2049 |I 145. (3.40) 


3.13 Show that 


\lo~2 po? || = min{x|p < xo}. (3.41) 


3.1.2. Other Quantum Information Quantities 


Next, we discuss another types of quantum information quantities. As the quantum 
version of the Hellinger distance dy(p, q), we introduce the Bures distance b(p, c) 
defined as 


def 


b*(p,0) = min 1 5 HYP VoU)(./p — Jou)". (3.42) 


The Bures distance b(p, o) also satisfies the axioms of a distance in a similar way to 
the Hellinger distance ***'°. Using (A.19), this quantity may be rewritten as 


1 
b*(p,0) =1— 5 max Tr (U/po + U* (./pV/o)*) 


2. Urunitary 


= 1— max ReTrU/pJ/o 


U zunitary 


=1-Tr|/pJVo| =1-Tr,//pov/p. 


Therefore, this value does not change when p and o are interchanged. Later, we 
will also see that this quantity also satisfies similar information inequalities (Corol- 
lary 8.4). The quantity Tr |,/p,/o| is called fidelity and is denoted by F(p, 7), which 
satisfies that log F(p, 0) = o |p||o). Then, it follows that 


b’(p,0) =1— F(p,o). (3.43) 
If one of the states is a pure state, then 
F(\u) (ul, p) = J (ulplu) . (3.44) 


The square of this value corresponds to a probability. If both p and o are pure states 
|u)(u| and |v) (vl, respectively, then Tr \/,/pa./p = |(u|v)| and the Bures distance 
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is given by 
2 
b*(\u){ul, |v)(v]) = 1 — |(ulv)|. 


We also define the trace norm distance d,(p, 0) as a quantum version of the 
variational distance by 


def 1 
d\(p,0) = zie oth. (3.45) 


where || - ||; denotes the trace norm (Sect. A.3). This also satisfies the monotonicity 
[see (5.51) and Exercise 3.29]. 

If the states involved in the above quantities are pure states such as |uv)(u| and 
|v) (v|, we shall abbreviate the notation to label the states, i.e., b(p, 7) will be written 
as b(u, v), and so on. 

The above information quantities satisfy the monotonicity with respect to the 
measurement M as follows. 


b(p, 0) = d,(P“, P™), (3.46) 
d\(p, 0) => d\(P“, P”), (3.47) 


Proofs of (3.46) and (3.47) are given in Exercise 3.19 and Sect. 3.4, respectively. As 
is discussed in Exercises 3.21—3.23, the equalities of (3.46) and (3.47) hold when 
the POVM M is chosen appropriately. 

Further, similarly to inequalities (2.25) and (2.26), the inequalities 


d\(p, 0) = b°(p,0) =1— F(p,o) = sto, 0) (3.48) 
D(pllo) = —2log Tr | /pVo| = 2b*(p, ) (3.49) 


hold ®*+**3?7, Thus, we can show 


(3.50) 


Peni. 
2 
From these inequalities we can see that the convergence of d(p,, 7,) to 0 is equiv- 
alent to the convergence of b(pn, 7) to 0. 
In order to express the difference between the two states p and 0, we sometimes 
focus on the quantity 1 — F?(p, 7), which is slightly different from b?(p, 7). Their 
relation can be characterized as™**” 


2b? (p,0) = 1— F°(p, 0) = b°(p, 0). (3.51) 


Also, the quantity 1 — F?(p, ) upper bounds the quantity d; (p, 7) ina way different 
from (3.48) as®* *? 
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1 — F°(p, 0) = d?(p,0). (3.52) 

Also, we have the quantum Pinsker inequality as follows’ **°. 
D(pllo) = 2d?(p, 2). (3.53) 


Note that this is a stronger requirement between D(p||c~) and d;(p, 7) than the com- 
bination of (3.48) and (3.49). 


Exercises 


3.14 Show that the information quantities d)(p, q), and d\(p, q) between q and p 
are equal to their quantum versions b(p, 7) and d;(p, a) for commuting p and o with 
diagonalizations p = >~; p;|ui)(u;| and o = >); gilui) (ui. 


3.15 Show that the Bures distance satisfies the axioms of a distance by following 
the steps below. 
(a) Show the following for arbitrary matrices X and Y: 


al Te XX* £6/TrVY* > 4/TH(X — YX — YY. 


(b) Show the following for density matrices ;, 02, 3 and unitary matrices U;, Ua: 


{Tr (/p1 — 2U1) (/p1 — /p2U1)” 
< \Tr(vai— veaU) (Vpi— /paUs)"+/Tr(Vis— VP2Ui U3) (Ve3— VP2U1 U3)’. 


(c) Show that b(p1, p2) < b(p1, 3) + b(p3, p2) for density matrices 1, p2, and p3. 
3.16 Show that the square of the Bures distance satisfies 

b°(p1, p2) < 2b*(p1, p3) + 2b"(p3, pr) - 
3.17 Show the following regarding the Bures distance for two different orthogonal 


bases {u;} and {v,;}. 
(a) Show that the vectors u = x J Deen uz and v = y pee! vz satisfy 


(ulv) = 0 pi(uilvs) (3.54) 


for (ug|v;) = 0, k A j, an arbitrary real number 6;, and a probability distribution p;. 
(b) Show that (3.54) still holds if 6; is chosen appropriately, even if the above con- 
ditions do not hold. 


3.18 Show that d,(|u) (u|, |v)(v|) = V1 — |(u|v) |? using Exercise A.3. 


104 3 Quantum Hypothesis Testing and Discrimination ... 


3.19 Show that 
> PM PW = Trl /pvol (3.55) 


fora POVM M and p, o following the steps below [10]. This is equivalent to (3.46). 
(a) Show that /Tr X*X /Tr Y*Y > | Tr X*Y| for two matrices X and Y. 

(b) Show that /Tr U p!/2M; p!/2U*/Tr o'/2Mjo!/2 > | Tr Up!/?2Mjo!/?| for a uni- 
tary matrix U. 

(c) Show (3.55). 


3.20 Show (3.12) following the steps below. 

(a) For any matrix A, show that || AA‘|| = || A‘ A|| by using the polar decomposition 
of A. 

(b) Show that Tr p207! < |lo~2 po? |). 
(c) Show that D;_;(p||o) is monotonically decreasing with respect to s. 
(d) Show (3.12). 


3.21 Suppose that the density matrix o possesses the inverse. Show that the 
equality in (3.55) holds if M = {Mj} is chosen to be the spectral decompo- 
sition of p!/?U*a7!/? = og! (g!/2pg'/?)!2g-1/, for U satisfying |p!/20!/?| = 
Upe = ge eer [10]. 


3.22 Suppose that the density matrix o does not possess the inverse. Show that there 
exists a POVM satisfying the equality in (3.55) by following the steps below. 

(a) Show that the support of matrix U is included in the support 71; of o when 
[p'/2o'/2| = Upl2g'? = o'/2p!/2U*, 

(b) Let M = {M,} be the spectral decomposition of the matrix p!/7U*o—!/? on Hy 
and let P be a projection onto H;. Show that the POVM {M;} U {7 — P}in# satisfies 
the equality in (3.55). 


3.23 Show that the equality in (3.47) holds when the POVM M is the diagonalization 
of p—o. 


3.24 Show that d;(p, 7) > b?(p, 7) by choosing a POVM M satisfying the equality 
in (3.46). 


3.25 Show that b?(p, 7) = 3d7(p, a) by choosing a POVM M satisfying the equal- 
ity in (3.47). 


3.26 Show that D(p||o) > —2 log Tr |,/p./c| by choosing a POVM M satisfying 
the equality in (3.46). 


3.27 Show that — log Tr |,/p,/a| > b?(p, 0). 
3.28 Show (3.51) by writing x = F(p,0) = 1— b*(p, 0). 
3.29 Show (5.51) using (3.59). 
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3.30 Show the quantum Pinsker inequality (3.53) following the steps below. 


i satisfies 


(a) Show that binary relative entropy h(x, y) ey log * + (1 — x) log 
Uy — x)? < h(x, y) forO<x<y<1. 

(b) Show that 2(Tr oP — Tr pP) = Tr |o — p| = Tr(o — p)(P — UI — P)) = 0 for 
P = {a — p> 0} or {o — p => O}. 

(c) Show (3.53). 


3.2 Two-State Discrimination in Quantum Systems 


Consider a quantum system 7 whose state is represented by the density matrix p or 
a. Let us consider the problem of determining the density matrix that describes the 
true state of the quantum system by performing a measurement. This procedure may 
be expressed as a Hermitian matrix T satisfying J > T > 0 in H, and it is called 
state discrimination for the following reason. 

Consider performing a measurement corresponding to a POVM M = {M,,}u<a 
to determine whether the true state is p or o. For this purpose, we must first choose 
subsets of §2 that correspond to p and a. That is, we first choose a suitable subset A 
of 2, andifw € A, we can then determine that the state is p, and if w € A‘° (where A‘ 


is the complement of A), then the state is o. The Hermitian matrix T = 22 AM. 
then satisfies J > T > 0. When the true state is p, we erroneously conclude that the 
state is o with the probability: 


>) Tr pM, = Trp >) M, =Tr pl —T). 


we A we AS 


On the other hand, when the true state is 0, we erroneously conclude that the state is 
p with the probability: 


Si TroM, =Tro > M, = TroT. 


weA weA 


More generally, when we observe w € §2, we decide the true state is p with the prob- 
ability ¢,, and it is o with the probability 1 — t,,. This discrimination may therefore 
be represented by a map ¢,, from (2 to the interval [0, 1]. When the true state is p, 
defining the Hermitian matrix T = oe g t.M.,, we erroneously conclude that the 
state is o with the probability: 


da —t,) Tr pM, = Trp Yd —t,)M, = Tr p(I — 7). 
wEes2 wes 


On the other hand, when the true state is 0, we erroneously conclude that the state is 
p with the probability: 
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b> t,, TroM,, = Tro > tM, = TroT. 


wes wEesd 


Therefore, in order to treat state discrimination, it is sufficient to examine the Her- 
mitian matrix 7. The two-valued POVM {7, J — T} for a Hermitian matrix T satis- 
fying I > T > O allows us to perform the discrimination. That is, we obtain 


min (Trp —T)+TroT) 
1>T>0 


_ . . M _ M 
= ae esti 2, (Pow) — 4) + Po w)tu) - (3.56) 


Henceforth, T will be called a test. 
The problem in state discrimination is to examine the tradeoff between the two 
error probabilities Tr oT and Tr p(J — T). We then prepare the following lemma. 


Lemma 3.2 (Holevo [11]; Helstrom [12]) Any two non-negative matrices A and B 
satisfy that 


min (Tr AUT) + Tr BT)=Tr A{A < B}+Tr B{A > B}. (3.57) 


The minimum value is attained when T = {A > B}. 


Thus, substituting p and o into A and B, Lemma 3.2 guarantees that 
min, (Tr pd — T)+ TroT) =Tr p{p —o < 0}+Tro{p—a > 0} (3.58) 


1 
=1—5lle— ah, (3.59) 


the second equation follows from the following relation ***”. 


|X max Tr xXT=Tr X({X >0}—{X <0) =Tr X(1—2(X < 0}). (3.60) 


For any POVM M, since the RHS of (3.56) is not greater than min, -1>;,>0 De 2 
(py (w)(Q — ty) + pe (w)t), the combination of (3.56) and (3.59) implies the 
inequality ||o — oll, = [RF — py l|1, i-e., we obtain (3.47). 

The minimization of the weighted sum Tr p(J — T) + c TroT can be treated by 
substituting p and co into A and B. Therefore, the trace norm gives a measure 
for the discrimination of two states. Hence, for examining the tradeoff between the 
two error probabilities TroT and Tr pU — T), it is sufficient to discuss the test 
T = {p—co => 0} alone. This kind of test is called a likelihood test. 

The error probabilities Tr p{p < co} and Tr o{p > co} are monotone with respect 
to c as follows. Using Lemma 3.2, we can show that 
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Tr p{p < co} < Trp{p < co} (3.61) 
Tro{p > co} = Tro{p > co} (3.62) 


when 0 < c <c’ as follows. Lemma 3.2 implies that 


Tr p{p < co} + cTro{p > co} <Trpf{p < c'o}+cTro{p > co} (3.63) 
Tr p{p < co} +c Tro{p> co} =Trp{p<co}+ecTro{p>coa}. (3.64) 


Hence, 


c(Tr ofp > co} — Tro{p > c'o}) <Tr p{p < co} — Tr p{p < co} 


<c'(Tro{p > co} —Tro{p > c’o}). 


The condition c < c’ guarantees (3.62). Since c > 0, (3.62) implies (3.61). 

In order to consider the intuitive picture of the likelihood test {p > ca}, we con- 
sider the case when p and o are commute, in which, they may be simultaneously 
diagonalized as p = >"; pi|u'){u'| and o = >, gi|u')(u'| using a common ortho- 
normal basis {u!,..., u@ }. Therefore, the problem reduces to the discrimination of 
the probability distributions p = {p;} and gq = {q;}, as discussed below. Henceforth, 
such cases wherein the states p and 0 commute will be henceforth called “classical.” 

Now, we discriminate between the two probability distributions p = {p;} and 
q = {qi} by the following process. When the datum i is observed, we decide the 
true distribution is p with the probability ¢;. This discrimination may therefore be 


represented by a map ¢; from {1,...,d} to the interval [0, 1]. Defining the map 
tj Ea (u'|T |u') from {1, ..., d} to the interval [0, 1] for an arbitrary discriminator T, 
we obtain 


>) - 4) pi = Tr pl — T), Sg: = TroT 


These are the two error probabilities for discriminating the probability distributions 
p = {pi}; and q = {q;};. If the function ¢; is defined on the data set of the measure- 
ment {|u')(u!|}; such that it is equal to 1 on the set {i|p; > cq;} and 0 on the set 
{i| pi < cq;}, then the test T is equal to {o > co}. Therefore, if p and o commute, 
T = {p > co} has a correspondence with the subset of the data set. If these density 
matrices commute, the problem may be reduced to that of probability distributions, 
which simplifies the situation considerably. The notation {p > co} can be regarded 
as a generalization of a subset of data set. 
For the likelihood test, we have the following lemma. 


Lemma 3.3 (Audenaert et al. [13]) Any two non-negative matrices A and B satisfy 
that 


Tr A{A < B}+Tr B{A > B} < TrAl*B’ (3.65) 
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fors € [0, 1]. 


Thus, substituting p and o into A and B, we have a useful upper bound of the sum 
of the two error probabilities as follows. 


min (Tr p(i1—T) + TroT) < min e®, (3.66) 
I>T>=0 se[0,1] 


Next, we consider a lower bound of the sum of the two error probabilities. 


Lemma 3.4 (Nussbaum and Szkota [14]) For two non-negative matrices A and B, 
we make their diagonalizations as 


A= > a;|u;)(uil, B= > ° djlvj)(vjl. (3.67) 
i J 
Then, we have 
min (Tr AJ—T)+Tr BT) > SS min{a; b;}\(v;|ui) |? (3.68) 
I>T>0 a= 32 ane dee , 


ij 


Now, we consider the case when A = p and B = o, and define two distributions 
Poo) i, J) = ai|(vj|ui)|? and Q(pjo)G, J) = bj|(vjlui)/? on@ := {1,..., dim H} 
x {l,..., dim 71} based on the notations in Lemma 3.4. Then, the right hand side of 
(3.68) >); ; min{a;, bj}|(vj|ui) |? can be transformed to min,, ) :1>1,)=0 Daj (Polo 
G@, JA — taj) + QeinG, tap). That is, the minimum discrimination probabil- 
ity between two states p and a is lower bounded by the half of that between two 
distributions P(,) and Q (pj). The pair of distributions P(,\.) and Q,\0) reflects the 
properties of the pair of two states p and o as follows. That is, we can show the 
following relations™***'. 


D(Popio)|| Q (oii) = Dello) (3.69) 
P(S| Popo) ll Qcpioy) = PIpllo) (3.70) 
Pipenjao") = Pollo)? Q (pan oan) = Qo): (3.71) 


These relations play important roles in latter sections. 


Proof of Lemma 3.2 The quantity to be minimized can be rewritten as 
TrAU —T)+Tr BT = TrA+Tr(B — ADT. 
Now, we diagonalize B — A as B — A = ’;, A; |u;) (u;|. Then, 


Tr(B — A)T = >) A; Tr |ui)(uilT. 


L 
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The test JT minimizing the above satisfies the following conditions: Tr |u;)(u;|T = 0 
when A; > 0; Tr |u;)(u;|T = 1 when 4; < 0. The test T satisfying these conditions 
is nothing other than {B — A < 0}. Accordingly, we have 


min Tr(B — A)T = Tr(B — A){B—A <0}. (3.72) 


Equality (3.57) can be proved according to 


amin Tr A(I —T)+Tr BT = TrA +Tr(B — A){B— A <0} 


= Tr A{A — B < 0}+ Tr B{A — B > 0}. 


Then, we can also obtain (3.58). See Exercise 3.33 for the derivation of (3.59). 


Proof of Lemma 3.3 We employ an alternative proof by Narutaka Ozawa [15, 16]. 
Since A — B < (A— B)4, we have A < B+ (A — B),. Similarly, the inequality 
B+(A-— 8B), => B holds. Hence, the matrix monotonicity of x » x* (Sect. 1.5) 
yields that 


A’ s (B= (A= B))" (3.73) 
(B+(A—B),)° — BS >0 (3.74) 
(B+(A—B),)'* > BI. (3.75) 


Hence, 


TrA—TrA’ “Bs = Tr A’ “{A* — B} 


<TrA' {(B +(A— B),)*° — B} (3.76) 
<Tr(B+ (A= B),)'* {G+ (A= B),)' = 8} (3.77) 
=Tr(B + (A — B);) — Tr(B + (A — B),)'*B° 

<TB+(A=Bjij— WR OB (3.78) 
=TrB+Tr(A — B), —Tr B =Tr(A — B),, (3.79) 


where (3.76), (3.77), and (3.78), follow from (3.73), (3.74), and (3.75), respectively. 
Thus, (3.79) implies 


Tr A{A < B} + Tr B{A > B} = TrA—Tr(A — B){A — B > 0} 
=TrA =(A= 8). = Tr AB’. 
| 


Proof of Lemma 3.4 Thanks to Lemma 3.2, we can restrict the matrix T to a projection. 
Then, we obtain 
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Tr BT = Tr BTT = Tr > bjlv,)(vj|T >. lui) (uilT = >. bj |(uilT|v;)?. 
j i ij 


Similarly, we have 


Tr AU —T) = > aj|(ui|f — TIvj)/?. 
i,j 


Since |(uj|T|v;)|? + (ust — Tlvj)|? = 5 luslTlvj) + (uilZ — Tlvj)? = 41 (vjlu;)|?, we have 


Tr A(I-T) + Tr BT = > dj \(ui|T|vj)/? + aj lui — T vj)? 
ij 
> >/ min{d;, aj}(\(ui|T|v;)/? + ui — T]v;)/) 


ij 


1 
Fe Pm areaieel 


Exercises 

3.31 Show the relations (3.69), (3.70), and (3.71). 
3.32 Show (3.60) referring to the proof of (3.72). 
3.33. Show (3.59) using (3.60). 

3.34 Show that ||o — pmixll1 > 20. — ™K4). 

3.35 Show that 


Tr A(VA < VB} +Tr B(VA > VB} < TrVAVB (3.80) 


by following the steps below. 
(a) Show the following inequalities. 


Tr A{VA < VB} < Tr VAVB(VA < VB} (3.81) 
Tr B{VA > VB} < TrVAVBIVA > VB}. (3.82) 


(b) Show the inequality (3.80). 


3.3. Discrimination of Plural Quantum States 


In this section, we extend the discussion of the previous section, where there were 
only two hypothesis states, to the case of k hypothesis states p;,..., p,. The state 


3.3 Discrimination of Plural Quantum States 111 


discrimination in this case is given by a POVM M = {M;}*_, with k measurement 
outcomes. For a fixed 7, the quality of the discrimination is given by the error prob- 
ability 1 — Tr p;M;. We are then required to determine a POVM M = {™, a that 
minimizes this error probability. However, since it is impossible to reduce the error 
probability for all cases, some a priori probability distribution p; is often assumed 
for the k hypotheses, and the average error probability ye pi — Tr p; M;) is min- 
imized. Therefore, we maximize the linear function 


k k 
def 
A(M,,..., Mx) = > Pi Tr pjM; = 03 rom.) 
iat i=l 


with respect to the matrix-valued vector (M, ..., Mj) under the condition 


k 
M;>0, >°M)=1. 


i=l 


For this maximization problem, we have the following equation® [17]: 


Peuess = Max {an, ..., My) 


k 

MSO. Mi = i (3.83) 
i=l 

= min{Tr F|F > p;p;}. (3.84) 


When the matrix F and the POVM (M,,..., Mx) satisfy this constraint condi- 
tion, they satisfy Tr F — Tr (Si Pi Pi Mi) = >); Tr Mi(F — pip) = 0. Hence, the 
inequality LHS < RHS in (3.84). The direct derivation of the reverse inequality is 
rather difficult; however, it can be treated by generalized linear programming. Equa- 
tion (3.84) can be immediately obtained from Theorem 3.1, called the generalized 
duality theorem for the linear programing, as explained after Theorem 3.1. 
Theorem 3.1 Consider the real vector spaces V,, V2. Let L be a closed convex cone 
of Vi; (Sect. A.4). Assume that arbitrary a > 0 and x € L satisfy ax € L. If A is 
a linear map from V, to V>, then the following relation is satisfied for b € Vz and 
ce Vi 7 


max{(c, x)|x € L, A(x) = b} = min{(y, b)|A*(y) —¢ € L*}, (3.85) 


where L* © {x € Vi\(x, x") > 0, Wx! € L}. 
For a proof, see Sect. 3.9. 

Using the relation (3.84), we derive another characterization for Pguess in Sect. 5.6. 
Now, we explain how to derive (3.84) by using Theorem 3.1. In our current 


3max{a|b} denotes the maximum value of a satisfying condition b. 
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problem, V; is the space consisting of k Hermitian matrices with an inner prod- 
uct ((M,..., Mx), (Mj, ..., My)) = ys Tr M; M}, V2 is the space of Hermitian 
matrices with the usual product (X, Y) = Tr XY, L is the subset in V; such that all 
the matrices are positive semidefinite, A is the map (M),..., Mg) bb = M;, and 
band c are J and (pip, ..., Pxpx), respectively. Applying Theorem 3.1, we obtain 
(3.84). Therefore, we have rewritten the multivariable maximization problem on the 
(left-hand side) LHS of (3.84) into a single-variable minimization problem involving 
only one Hermitian matrix on the (right-hand side) RHS of (3.84). In general it is 
difficult to further analyze such optimization problems except for problems involving 
special symmetries [18]. Due to the fundamental nature of this problem, it is possi- 
ble to reuse our results here in the context of other problems, as will be discussed in 
later sections. In these problems, it is sufficient to evaluate only the upper and lower 
bounds. Therefore, although it is generally difficult to obtain the optimal values, their 
upper and lower bounds can be more readily obtained. 

In this section, the problem was formulated in terms of generalized linear pro- 
gramming [19]. However, it is also possible to formulate the problem in terms of 
semidefinite programming (SDP) [20]. The semidefinite programming problem has 
been studied extensively, and many numerical packages are available for this prob- 
lem. Therefore, for numerical calculations it is convenient to recast the given problem 
in terms of SDP [21]. 

The generalized duality theorem given here may also be applied to other problems 
such as the minimization problem appearing on the RHS of (6.106) in Sect. 6.6 [22— 
24] and the problem involving the size and accuracy of the maximally entangled 
state [25] that can be produced by class @ introduced in Sect. 8.16. Therefore, this 
theorem is also interesting from the viewpoint of several optimization problems in 
quantum information theory.* 


Exercise 


3.36 Show that the average correct probability ar i 


distribution is less than ¢. Furthermore, show that it is less than ¢ max; ||p;|l. 


Tr p;M; with the uniform 


3.4 Asymptotic Analysis of State Discrimination 


Itis generally very difficult to infer the density matrix of the state of a single quantum 
system. Hence, an incorrect guess might be obtained from a single measurement. To 
avoid this situation, one can prepare many independent systems with the same state 
and then perform measurements on these. In this case, we would perform individual 


4 An example of a numerical solution of the maximization problem in quantum information theory 
is discussed in Sect.4.1.2, where we calculate the classical capacity C.(W). Nagaoka’s quantum 
version of the Arimoto—Blahut algorithm [26, 27], known from classical information theory [28, 
29]. The connection between these quantities and linear programming has also been discussed 
widely [20, 30]. 
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measurements on each system and analyze the obtained data statistically. However, 
it is also possible to infer the unknown state via a single quantum measurement on 
the composite system. There are many methods available for the second approach as 
compared with the first. Therefore, it would be interesting to clarify the difference 
between the optimal performances of the two approaches. 

Let us consider the problem of state discrimination for unknown states given by 
the tensor product states such as p®” and 0®”. This may be regarded as a quantum- 
mechanical extension of an independent and identical distribution. If arbitrary mea- 
surements on H®" are allowed to perform this discrimination, we can identify the 
Hermitian matrix T satisfying 1 > T > 0 on H®"” as the discriminator. If we restrict 
the allowable measurements performed on ®” to be separable or adaptive, the 
problem becomes somewhat more complicated. 

Let us consider the first case. The minimum of the sum of the two error probabil- 
ities is then given by min;>7>0 (Tr p®" (J — T) + Tr o®"T), and it asymptotically 
approaches 0 as n increases. Since this quantity approaches 0 exponentially with n, 
our problem is then to calculate this exponent. By using Lemmas 3.3 and 3.4, the 
exponent can be characterized as follows. 


Lemma 3.5 (Chernoff [31]) Any two density matrices p and o on the system H 


satisfy 


‘ @n = @n . 
a, (Tr p Ud —T)+Tro T) < exp(n i) (s)) (3.86) 


where 0(s) = 6(s|p||o) was defined in Sect. 3.1. In the limit for, the equation 


1 
. LS . @n = @n as 
lim 7 log a, (Tr p U-T)+Tro T) Pan (s) (3.87) 


noo 


holds. 


Since the classical case of Lemma 3.5 has been shown by Chernoff [31], the bound 
— infj+s+0 @(s) is called Chernoff bound. 


Proof (3.86) follows from Lemma 3.3 with A = p®”" and B = o®". (3.86) shows the 
part “>” in (3.87). Hence, we show the part “<” in (3.87) by using Lemma 3.4. Now, 
we define two distributions P := Pypjz) and QO := Qyp\>). Then, we can show that 
(s|P||Q) = d(s|pllo) = 6(s)**3*". Applying Lemma 3.4 to the case with A = p®” 
and B = a®”, we have 
2 min (Trp®"(1 —T)+Tro®’T 
pmin, (Tr p®"( — T) + Tro®"T) 
>P"{w" € 2"|P"(w") < Q"(w")} + QO" fw" é Q"|P” (w”) > Q” (w”)}. 


Hence, it is sufficient to show that 
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1 
lim —=log( P"{w" en Pury =0'w) 
n 


noo 
+ QO" {w" E "| P"(w") > o"w")}) 
= max —¢(s|P||Q). (3.88) 
se[0,1] 
The application of Cramér Theorem (Theorem 2.7) to the random variable log aa 
yields that 
. 1 
lim —— log P"{w” € 2"|P"(w") < O"(w")} 
noo n 
1 Pp” n 
Hiss ioe PsP eat les a 
n>oo Nn Q"(w") 
= sup —¢(s|P||Q). (3.89) 
se[0,0o) 
Similarly, 
: I n nn n n An n nA 
lim —— log Q"{w" € 2"|P"(w") > Q"w")} = sup —@(s|P||Q). (3.90) 
n>oo n s€(—oo, 1] 


Now, we note that ¢(s|P||Q) > 0 for s € (—oo, 0) U C1, oo) and @(s|P||Q) < 0 for 


s € [0,1]. Since SUPs€[0,00) —(s|P||Q) and SUP s€(—00, 1] —(s|P||Q) are 
greater than zero, we have maxsejo,1j —9(5|P||Q) = sup, ero,.0) —P(S| P|] Q) = 
SUP s<(—00, 1] —(s|P||Q). Thus, combination of (3.89) and (3.90) yields (3.88). 


Exercises 


3.37 Show equation (3.87). 


3.38 Show (3.86) when p is a pure state |u)(u| following the steps below. 
(a) Show that infj.,~9 d(s) = log(u|o|u). 
(b) Show (3.86) when T = |u) (u|®". 


3.39 Check that the bound inf). ,.9 6(s) can be attained by the test based on the 
multiple application of the POVM {|u)(u|, 7 — |u)(u|} on the single system. 


3.40 Show (3.87) when p and a are pure states |) (u| and |v) (v| following the steps 
below. 


(a) Show that min; so (Tr p®"(I — T) + Tro®"T) = 1 — /1 = |(ulv) |". 


(b) Show that lim,_,., + log 1 — J1— |(u|v) |?" = log |(u|v) |. 
(c) Show (3.87). 


3.41 Show that inf\;,.9 6(s) = @(1/2) when o has the form o = UpU* and 
UP =i, 
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3.42 Assume the same assumption as Exercise 3.41. We also assume that 6” (1/2) > 
0 and 6(1/2|pllc) < log F(p, a). Show that $(1/2|p||o) < miny inf +5>0 d(s| 
lag ||P“) using (3.19) and log F(p, a) < o(1/2|P™ ||P), which is shown in Corol- 
lary 8.4 in a more general form. Also show that $(1/2|p|lo) < lim 


1 : : n n 
+ miny inf }>5>0 (5 |P Mn (Pees): 


N— CO 


3.5 Hypothesis Testing and Stein’s Lemma 


Up until now, the two hypotheses for the two unknown states have been treated 
equally. However, there are situations where the objective is to disprove one of the 
hypotheses (called the null hypothesis) and accept the other (called the alternative 
hypothesis). This problem in this situation is called hypothesis testing. In this case, 
our errors can be classified as follows. If the null hypothesis is rejected despite being 
correct, it is called the error of the first kind. Conversely, if the null hypothesis is 
accepted despite being incorrect, it is called the error of the second kind. Then, 
we make our decision only when we support the alternative hypothesis and withhold 
our decision when we support the null one. Hence, the probability that we make a 
wrong decision is equal to the error probability of the first kind, i.e., the probability 
that an error of the first kind is made (if the null hypothesis consists of more than 
one element, then it is defined as the maximum probability with respect to these 
elements). Hence, we must guarantee that the error probability of the first kind is 
restricted to below a particular threshold. This threshold then represents the reliability, 
in a statistical sense, of our decision and is called the level of significance. The usual 
procedure in hypothesis testing is to fix the level of significance and maximize the 
probability of accepting the alternative hypothesis when it is true; in other words, 
wes minimize the error probability of the second kind, which is defined as the 
probability of an error of the second kind. For simplicity, we assume that these two 
hypotheses consist of a single element, i.e., these are given by p and o, respectively. 
Such hypotheses are called simple and are often assumed for a theoretical analysis 
because this assumption simplifies the mathematical treatment considerably. 

As before, we denote our decision by a test T where 0 < T < I, despite the asym- 
metry of the situation (the event of rejecting the null hypothesis then corresponds to 
I — T). The error probability of the first kind is Tr p(/ — T), and the error probabil- 
ity of the second kind is Tr oT. The discussion in Sect. 3.2 confirms that in order to 
optimize our test, it is sufficient to treat only the tests of the form T = {0 — cp < 0} 
[see RHS of (3.57)]. However, the analysis in Sect.3.4 cannot be reused due to the 
asymmetrical treatment of the problem here and therefore another kind of formalism 
is required. Let us first examine the asymptotic behavior of the error probability 
of the second kind when the null and the alternative hypotheses are given by the 
tensor product states p®” and o®” on the tensor product space ®” with a level of 
significance of € > 0. 


Theorem 3.2 (Hiai and Petz [32], Ogawa and Nagaoka [7]) The minimum value of 
the error probability of the second kind 3" (p||o) satisfies 
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1 
lim ——log 8” (pllc) = D(plla), 1 > Ve > 0 (3.91) 
n>o n 
BE(pllo) = min (Tro®"T| Trp" — T) < 4 (3.92) 


when the error probability of the first kind is belowe > 0(i.e., the level of significance 
is equal to €). 


This theorem is called quantum Stein’s lemma, which is based on its classical 
counterpart of Stein’s lemma. Of course, if and o commute, we may treat this testing 
problem by classical means according to the arguments given after Lemma 3.2 in 
the previous section. From (3.8) the relation between the quantum relative entropy 
D(p\|a) and Chernoff’s bound inf) ,+9 @(s) is illustrated in Fig.3.1. In particular, 
when o = UpU* and U? = I, D(p||o) > —26(1/2) = —2 inf) s;>0 G(s). 

Since the proof below also holds for commuting p and a, it can be regarded as 
a proof of the classical Stein’s lemma, although it is rather elaborate. The proof of 
Theorem 3.2 is obtained by first showing Lemmas 3.6 and 3.7. 


Lemma 3.6 (Direct Part, Hiai and Petz [32]) There exists a sequence of Hermitian 
matrices {T,} on H®" with I > T, = 0 such that for arbitrary 6 > 0 


. 1 
lim —— log Tro®"T, > D(pllo) — 6, (3.93) 
noo jn 
lim Tr p®(I — T,) =0. (3.94) 
noo 


Lemma 3.7 (Converse Part, Ogawa and Nagaoka [7]) If a sequence of Hermitian 
matrices {Ty} (I => Ty, = 0) on H®" satisfies 


a | 
lim —-logTro®’T, > D(pllo), (3.95) 
n>o Nn 
then 
lim Tr p®"(I — Tp) = 1. (3.96) 
noo 


Proof of Theorem 3.2 using Lemmas 3.6 and 3.7 For 1 > € > 0, we take {T,,} to 
satisfy (3.93) and (3.94) according to Lemma 3.6. Taking a sufficiently large N, we 


Fig. 3.1 Chernoff’s bound ; 
and quantum relative entropy D(p|o) isiaataniiang 


—inf as | plo) Ronee A 


O<s<1 


y -9(s | plo) 
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have Tr p®"(J — T,) < € forn > N from (3.94). Therefore, 3” (p|lo) < Tro®"T,, 
and we see that lim - 1 log B" (p|lo) => D(p||o) — 6. Since 6 > 0 is arbitrary, 


mre aes 


we obtain lim -—= 7 log B' (pila) => D(pllo) by taking the limit 6 > 0. 


N— CO 


Now, let limyo. — 4 log 8? (p\lo) > D(pl|o) for a particular 1 > € > 0. Then, 
we can take a sequence of Hermitian matrices {7,,} on H®” with J > T;, > 0 that 
satisfies 


1 
lim ~~ log Tro®"T, > D(pllc), Trp®"U —T,) <«. 


n—>Oo 


However, this contradicts Lemma 3.7, and hence it follows that lim,_,.. — + + log om 
(p\|a) < D(pl|o). This proves (3.91). | 


It is rather difficult to prove the above two lemmas at this point. Hence, we will 
prove them after discussing several other lemmas forming the basis of the asymptotic 
theory described in Sect. 3.8. In fact, combining Lemmas 3.6 and 3.7, we obtain the 
following theorem (Theorem 3.3), which implies Theorem 3.2. 


Theorem 3.3 Define 


def 1 
B(p||o) = sup gti ——logTro®"T, 
{Tn} n 


lim Tr p®"(I — Th) = of : 
n—>0o 


? : ae! 
B' (pllo) sup lim ——log Tr o®"T, 
{Tn} n 


n—- oo 


lim Tr p®”(I — T,) < i}. 


noo 


Then, 


B(pllo) = B(pllc) = D(pllo). 


As a corollary, we can show the following. 


Corollary 3.1 
1 
D(p\|o) = im — ~ max D(Pin||Pzen)- (3.97) 


Proof Applying the classical Stein’s lemma to the case with Py ‘on and Poe we 
can show that DES, [P.) = = BES |P™,.) < B(p®"||0®") = aDG@lia) Then: we 
obtain the > part. Let {T7,,} be a sequence of tests achieving the optimal. Then, 


for any € > . we can prove that +DP {TnI ‘ape Tn) = =(lage = log — 


A" (pllo))) + =£og(l — €) — log A"(pllo)) > (1 —&)D(plla). Hence, limy-s2o 4 
max y DP ren |P™,) > Ud —.€)D(pl||c). Taking the limit « — 0, we obtain the < 
part. a 


For a further analysis of the direct part, we focus on the decreasing exponent of the 
error probability of the first kind under an exponential constraint for the error proba- 
bility of the second kind. For details on the converse part, we assume an exponential 
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constraint of the error probability of the second kind and optimize the decreasing 
exponent of the correct probability of the first kind. In other words, we treat the 
following values: 


def ° 1 @n 
B(r|p|lo) = sup; lim ——logTr p°" (UJ — T,,) 
n 


{Th} Uno 


. 1 
lim —— log Tro®"T, > r} : 
n—->oo 
(3.98) 


ee 
B*(r|pllo) in| lim —— log Tr p®”T, 
{Th} [nro on 


1 
lim —— log Tro®"T,, => r| . (3.99) 
n 


n—-oo 


Then, we obtain the following theorem. 


Theorem 3.4 Then, the relations 


gig POI) oes ani “Diep: (3.100) 
0<s<1 l=% 7:D(t\lo)<r 
ap PEA — ge ripya) = sup Sale 
s<0 l-s s<0 l-s 
< min D(rllp) (3.101) 


~ 7:D(tl|o)=r 


hold. 


The equation in (3.100) will be shown in Sect. 3.7. The inequality in (3.100) will 
be shown in Exercise 3.57. The equation in (3.101) will be shown in Sect. 3.8. The 
first inequality in (3.100) will be shown in Sect. 3.8. The second inequality in (3.100) 
will be shown in Exercise 3.179. 

The commutative case i.e., the classical case is easier. For two probability distri- 
butions p and p, we have equations 


- —sr — 6(s|p\lp) 
B(r|pll>) =sup — = min D@llp), (3.102) 
O0<s<1 = ¥ qD(q\lp)sr 
7 —sr — O(s|pllp) : 
B*(r|p||p) = sap—— = n  D(qilp) +r — D(qllp), (3.103) 
s<0 — J q:D(q\lp)sr 


where all equations except for the second equation in (3.103) hold with all r > 0. 
The second equation in (3.103) holds withr > D(p||p). The first equation in (3.102) 
will be shown jointly with the general case, i.e., the equation in (3.100). The first 
equation (3.103) will be shown in Exercise 3.54. The second equation in (3.102) is 
shown in Exercise 3.45. The second equation in (3.103) is shown in Exercises 3.52 
and 3.53. 

We can also characterize the asymptotic optimal performance of quantum simple 
hypothesis testing with a general sequence of two quantum states [33, 34]. In this 
general setting, the main problem is to determine the behavior of Tr p{p — e“a > O} 
as a function of a [35]. 
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Finally, using Corollary 3.1, we characterize other quantum versions of relative 
entropy because there exist many quantum versions of relative entropy even though 
we impose the condition that the quantity equals the relative entropy with two com- 
mutative inputs. To discuss this issue, we denote such a quantity by D(pllo). Then, 
the condition is written as 


D(pllo) = D(piiq) (3.104) 


for two commutative density matrices p and o whose eigenvalues form the probability 
distributions p and g. Now, we impose two additional conditions for a quantity 
D(p||c); One is the monotonicity for a measurement M; 


D(pllo) => DPM IPM). (3.105) 
The other is the additivity; 
D(pi ® prllo1 ® 02) = D(pillor) + D(pr lon). (3.106) 


Then, Corollary 3.1 implies that 
<4 : 1 ~ @n @n : 1 M M 
D(pllo) = lim —D(p™ lo") = lim — max D(P%en ||P zen) = D(pllo). (3.107) 
non n>on M 


That is, the quantum relative entropy D(p||c) is the minimum quantum analog of 
relative entropy with the monotonicity for measurement and the additivity. Note that 
Condition (3.105) is a weaker requirement than the monotonicity for TP-CP map 
(5.36), which will be explained in Chap. 5. 


Exercises 


In the following, we abbreviate (s|p||p) to d(s). 


3.43 Define the distribution p,(x) := p(x)! p(x)'e-?™ and assume that a distri- 
bution q satisfies D(q||p) = D(ps|| p). Show that D(ps||p) < D(qg||p) fors < 1 by 
following steps below. 

(a) Show that +4 D(qllps) = 7 Dy a) log q(x) — log A(x) — E, g(a) og 
p(x) — log p(x)) + @. 

(b) Show D(q||p) — 74, DG |lps) = D(pslp). 

(c) Show the desired inequality. 


3.44 Show the same argument as Exercise 3.43 by the following alternative way. 
(a) Show that {p;(x)} is an exponential family 
(b) Show the desired argument by using Theorem 2.3. 


3.45 Show the equation 


= min Dally). 3.108 
O<s<1 l-s @ebiqini=r (q\lP) ( ) 
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following the steps below. 

(a) Show that =H #0) <0 forr > D(p||p) ands ¢€ [0, 1]. 

(b) Show that both sides of (3.108) are zero when r > D(p||p). 
(c) Show that 


D(ps|l Pi) = (s — )4¢"(s) — (5) 
D(Ps|l Po) = s$'(s) — (5) 


(d) Show that 
d FJ Wt 
a — le (s) — (5) = (8 — Ie") < 0 
d 
755? &) — o(s) =s¢"(s) > 0 
s 


for s € (0, 1). 


(3.109) 
(3.110) 


(3.111) 


(3.112) 


(e) In the following, we consider the case r < D(p||p). Show that there uniquely 


exists s, € (0, 1) such that D(p,, || p1) =r. 


(f) Show that 
min D(q\|p) = D(ps, ||P). 
q:D(q\|p)=r 
(g) Show that 
min D = D(ps,||p). 
q:D(q\\b) <r (@llp) Ne) 
(h) Show that 
Sr — P(s;) 
D(ps, ||P) = ies. 
= Ks 


(i) Show that 


d —sr—(s)_  —r+(s—1)¢'(s) — 46) 
ds 1l-s (1—s)2 , 


(j) Show that 


—sr — p(s) —s,r — G(s;) 
sup = . 


O0<s<1 l-s 1— S; 


(k) Show (3.108). 


(3.113) 


(3.114) 


(3.115) 


(3.116) 


(3.117) 
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3.6 Hypothesis Testing by Separable Measurements 


In the previous section, we performed the optimization with no restriction on the 
measurements on H®". In this section, we will restrict the possible measurements to 
separable measurements. In other words, our test T is assumed to have the separable 
form: 


T=) (M,,®--- OM, Mi, 20M, SOonn™, 


lw, = 
Wn 


which is called a separable test. This class of tests includes cases such as making 
identical measurements on every system 7 and analyzing measurement data sta- 
tistically. As explained in (1.25), it also includes other methods such as adaptive 
improvement of the measurements and statistical analysis of measurement data. The 
following theorem evaluates the asymptotic performance of the tests based on these 
measurements. 


Theorem 3.5 Defining B(p||o) as 


ia def . 1 @n . @n 
B(p\la) = sup lim ——logTro®"T,,| lim Tr p°"(U — T,) = 07, 
n n—->0o 


{T,,}:separable | n—> 00 


we have 


D M M 
B(pllo) = max D (Py||Py) . (3.118) 


def ; . 
When the measurement Minax = mee D ay lead ) is performed n times, the 


bound maxy D (pe |p! ) can be asymptotically attained by suitable statistical 
processing of the n data. 


This theorem shows that in terms of quantities such as B (p||o), there is no asymp- 
totic difference between the optimal classical data processing according to identical 
measurements M,,,x on each system and the optimal separable test across systems. 

Therefore, at least for this problem, we cannot take advantage of the correla- 
tion between quantum systems unless a non-separable measurement is used. Since 
B(pl\c) < B(p||c), we have the monotonicity of quantum relative entropy for a 
measurement 


D (PM ||PM) < Diplo). (3.119) 


The following theorem discusses the equality condition of the above inequality. 


Theorem 3.6 (Ohya and Petz [36], Nagaoka [37], Fujiwara [38]) The following 
conditions are equivalent for two states p and o and a rank-one PVM M = {M; Vee 


® The equality in (3.119) is satisfied. 
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@ [o, p] = 0 and there exists a set of real numbers {aj}, satisfying 


d d 
p= (Sam) = (> aa )a (3.120) 
i=1 i=1 


Here, notice thata PVM M satisfying Condition @ is not limited to the simultane- 
ous spectral decomposition of p and o."**“° Theorem 3.6 will be shown in Sect. 5.4. 
Also, another proof will be given in Exercise 6.32. 


Proof of Theorem 3.5 The fact that max yy D (Fe (Ps) can be attained, i.e., the “>” 
signin (3.118), follows from the relation B (pe |p”) =B (py ||P") =D (Pe |p“) 
shown by Stein’s lemma with the classical case. Therefore, we show that B (p||o) does 


not exceed this value, i.e., the “<” sign in (3.118). It is sufficient to treat lim,,_, ,, 
4 log et (A,) for the pair of separable measurements M” = {M2 }.,,<2,: 


M? =Mi,, ®---@Mi.,, Mi, =0,...,M, > 0onH™” 
and a subset A, of 2, with Pe (At) — 0. First, we show that 
1 . 
—D (Pi, pM.) < < max D(PMIIPM). (3.121) 


Y def def : 
For this purpose, we define ax... = [jx Tr Mj... p and M”; Ye. M;.,,,- Since 


an arbitrary state p’ on H satisfies 


Wn 


= Tr p2-) @ p' @ p2"- = 1, 


we see that )° Mik = I; hence, we can verify that {M7 *} isa POVM @. Moreover, 
we can show ™**7 that 


- 
D (PM. 


PL) = + (Py pu 


and thus verify (3.121). Since the monotonicity of the relative entropy for a proba- 
bility distribution yields 


ne) (3.122) 


bs (An) (log PY (An) — log PE (An)) 
Pas. (AS) (log Pie. (AG) — log P¥s, (As) 


<D re 


pit.) <nmax D (PM PM), 
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we obtain 


maxy D (PM\|PM) + Ln(PMt (A,)) 


a (3.123) 
Pat (An) , 


1 n 
—— log P¥en(An) < 
n 


where we used the fact that —Pe (Af) log |e (Aj) = 0, and h(x) is a binary entropy 
that is expressed as h(x) = —x logx — (1 — x) log(1 — x). Noting that h(x) < log2 


and Ee (A,) — 1, we have 


_— | mu" M)pM 
lim = log Pin (An) < max D (P. (Pe hs 


from which we obtain (3.118). a 
Exercises 


3.46 Give an example of p, 0, and a PVM M satisfying Condition © such that a 
PVM M is not the simultaneous spectral decomposition of p and a. 


3.47 Prove (3.122). 


3.7 Proof of Direct Part of Stein’s Lemma and Hoeffding 
Bound 


In order to prove the direct part of Stein’s Lemma, i.e., Lemma 3.6, firstly, we show 


—sr — slello) 


(3.124) 
l-s 


B(r|pllo) = sup 
O<s<1 


Whenr < D(p|o), the right hand side of (3.124) is strictly greater than zero because 
_ dette) |s<o0 = D(pl|c). This fact proves Lemma 3.6. 

In order to show (3.124), we apply Lemma 3.3 with A = e~”* p®" and B = 0®" 
with an arbitrary real number R. Then, we obtain 


Tre ie oe eo a Te le pp ee | (3.125) 


<e Tl-s)R Tr(p®") l-s (o®") = el(-U—-s)R+4(8) , GB. 126) 
which implies that 


Tr era < o®"} < green. 


Tr ae Cia a > o®"} 2 elt —U-s)R+9(8)) 


Given a positive real number ry and s € [0, 1], we choose R = rots) Then, 
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_ yy rsrzolslallo) 
Tr pr{p™ < eR Bn) < e n — 


Tr eng > eR Bn) < en, 


which implies 


~sr — @(slalla) 


B(r|pllo) = i 
—s 


Taking the maximum with respect to s € [0, 1], we obtain (3.124). 
In order to show the inequality opposite to (3.124), we prepare the following 
lemma. 


Lemma 3.8 When R € [—D(o||p), D(pl|o)], we have 


1 
lim ——log Tr p®"{e7"* p®" < o®"} = max, —sR — $(s), (3.127) 
no n 

1 
lim —— log Tr o®"{e7"* p®" > o®"} = max x (1 —s)R— G(s). (3.128) 
noo Nn 


Now, we recall the relation (3.61). Then, (3.127) implies 


1 
lim —— log Tr pee Pile) a ac <a0"< Mus —sR— G(s) (3.129) 


noo 


for R < D(p||c). The limit R > D(p|l|c) yields that 
1 
lim —— log Tr p®"{e" Pl) 5" < g2"} = 0 (3.130) 
n>oo n 


Due to the relation (3.61), the relation (3.127) implies lim,_,.. —= 1 log Tr p®" fe" 
p®" < o®"} is positive if and only if R < D(p||c). Similarly, the left hand side of 
(3.128) is positive if and only if R > —D(a|lp). 

Since Lemma 3.2 yields that 


: —nR @n @n 
T I-T T T 
nin, (e rp?" ( )+Tro®"T) 


=—Trp eo" <a elo fe Va sa", 


our test can be restricted to tests with the form {{e~"* p®" < o®"}, {e7"* p®" > 
o®"}}. Thanks to the above observation, using Lemma 3.8, we obtain 


B(r|p|o) 


1 
= sup { lim —— ~ log Tr gies <a 
Re(—D(ollp), D(plla)) 6" 7 
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1 
lim —-— , log Tree oe see r} 
noo 


= sup {max —sR — ¢(s)| a a —s)R—(s) =r} 
Re(—D(ol|p),D(plia)) 5€10-1) 

= sup {—sR, — $(s)|(1—s)R, — 9(s) = 7}, 
s€(0,1) 


where R; = —£(s) (See Exercise 3.48). Now, we choose so € (0, 1) such that 
(1 — 50) Rs, — O(so) = r. Then, we obtain 


—sor — (So) 


B(r|p\o) = — soRs, — $(80) = =m 


(3.131) 


Due to (3.124), we obtain 


x ST OS) _ Sor — $(50) 


se(0,1) l-s ~ 1— 50 


Hence, we obtain the inequality opposite to (3.124). 
The remaining inequality concerning the Hoeffding bound in Theorem 3.4 is 


Berlpll) <_ min _ Delp). (3.132) 


which is shown in Exercise 3.57. 


Proof of Lemma 3.8 Similar to the proof of Lemma 3.5, we define two distributions 
P := Pio) and QO := Q pI). The application of Cramér Theorem (Theorem 2.7) to 


the random variable log ao yields that 


1 
lim ——log P'{w" € Re Pw") < QO" (w")} 
n 


n—->oo 
: 1 n n n 1 eS (w") 
= lim —— log P"{w" € 2" |— log < R} 
n>oo Nn n Q"(w") 
= sup —sR— G(s|P||Q). (3.133) 
se[0,00) 
Here, we can show that 
sup —sR — G(s|P||Q) = pale —sR — G(s|P||Q). (3.134) 
seE[0,00) 1) 
Since the map s +> ¢(s|P|| Q) is convex, the value R, = — £ $(s|P ||Q) is pmeeie 


ically decreasing with respect to s. Since R € (—D(Q|| P), D(P||Q)) and 4 45 O(S|P 
|Q)|s=1 = D(QI| PP) and 4£4(s|P||Q)|s-0 = —D(P||Q), using the fact shown i in 
Exercise 3.48, we can show that the above maximum is realized in (0, 1), i.e., (3.134) 
holds. 
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Similarly, we can show that 


1 
lim —— log QO" {w" € Pie Ps) = Oe ih 
n>o Nn 
= max (1 — s)R — (s|P||Q). 
se(0,1) 
Now we employ Lemma 3.4 with A = e~"* p®”" and B = o®". Then, 
eR Tr pee ie < 8") 2h Tr ae > 8") 
I —nR pn n n; ,—nR pny, ,n n n 
ary PM" € 2" le" P*(w") < Q"(w")} 
+ O"{w" (= Pie Pe > o"w"))). 
Thus, 
fi : 1 @ny,—nR @n @n 
minfR + lim —— logTr p*"{e ""p*" < a°"}, 
n>o n 


1 
lim ——log Tro®"*{e"8 p®" > ¢®"}} 
noo n 


< max (1 ~s)R ~ 6(5|PIIQ). 


(3.135) 


(3.136) 


In fact, the opposite inequality holds due to (3.125) and (3.126). That is, the inequality 
(3.127) or (3.128) holds at least. Assume that the inequality (3.127) holds. We choose 
a sufficiently small « > 0 such that R — « € (-D(Q||P), D(P||Q)). Then, Lemma 


3.4 implies that 


e Rte) Tr ple < o®"} ah Tr ete 8 > 8") 


Se MRO Tr pie < gn" oh Tr gerte Mere ge" > a=", 


Applying (3.136) with R + €, we have 
1 
min{ R ten Tim ——loe Trp ep =e"), 
noo n 
1 
lim —— log Tr o®"{e7"* p®”" > oe") 
n>o n 
1 
< min{ R +e+ lim ——logTr p®"{e "4" p® < o®"}, 
no Nn 


1 
lim —— log Tr o®"{e"F +9 92" s GB | 
ee { p } 


= max (1 —s)(R +6) — 6(s|PI]Q) < e+ max 1 —s)R ~ 6(5|PI1Q) 


se(0,1 


1 
=R+e+ lim ——logTr p®{e"* p® < o®"}, 
noo n 
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where the strict inequality (a) follows from the fact that the maximum maxseo,1)(1 — 
5)(R + €) — G(s|P||Q) is realized with s > 0. Since min{A, B} < A implies that 
min{A, B} = B, we have 

—nR @n 


. 1 
lim —— log Tra®"{e7"* p®" > a 
n>oo Nn 


S max (1 —s)(R + €) — $(s|P||Q). 


On) 


Taking the limit « — 0, we obtain (3.128). 
Conversely, we assume that the inequality (3.128) holds. Replacing « by —e, we 
have 


1 
min{ R 4+ e+ lim ——log Tr p®"{e-"® p® < 92"), 
n>o n 


1 
lim —— log Tr o®"{e-"8 p®" > o®")} 
n 


noo 
(a) 
< max (1 — 5)(R — €) — G(s|P||Q) < max (1 —s)R — d(3|P||Q) 
se€(0,1) se(0,1) 
i 
= lim —— log Tr o®"{e-"* p®" > o"}, 
noo n 


where the strict inequality (a) follows from the fact that the maximum maxse0,1)(1 — 
s)(R — €) — d(s|P||Q) is realized with s < 1. Since min{A, B} < A implies that 
min{A, B} = B, we have 


‘ 1 @ny ,—nR On @n 
R+e+ lim ——logTr p*"{e "“p?" < o°"} < max (1 —s)R — G(s|P||Q), 
n>oo n se(0,1) 
which implies that 
1 
lim —-— log Tr p®"{e7"* p®" < o®"} < —e+ max —sR — $(s|P||Q) 
n>o Nn se(0,1) 


Taking the limit « — 0, we obtain (3.127). | 


Exercise 


3.48 Show that max;<,o,11] —SRs, — (s) = —soRs, — O(So) for so € (0, 1). Use the 
fact that @(s) is convex. 


3.8 Information Inequalities and Proof of Converse 
Part of Stein’s Lemma and Han-Kobayashi Bound 


In this section, we first prove the converse part of Stein’s lemma based on inequality 
(3.20). After this proof, we show the information inequalities (3.18) and (3.20). 
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Proof of Lemma 3.7 Applying inequality (3.20) to the two-valued POVM {T,, J — 
T,}, we have 
ip") “ee Gy 
<(Tr p®"T,)'* (Tro T,)° + (Te p® (I — T,))'* (ir oI — T,))* 
<e" lalla) (3.137) 


for s < 0. Hence, 


l-s 


log(Tr p®"T,) + ~ log(Tro®"T,) < o(s|pllo). (3.138) 
n 


Solving the above inequality with respect to —t log(Tr p®"T,,), we have 


1 —(s|pllo) — s — Lt log Tro®"T, 
—lpogctr p27, > PS lella) — (3.139) 
n —S§ 


When r = lim —+ log Tr o®"T,,, taking the limit, we obtain 


ee | = = 
lim —— log(Tr p®"T,) > a (3.140) 
n —Ss 


When r > D(pl|c), the equation —¢/(0) = D(pl||c) implies 282 = 260-9 < ; 


—So —So 
for an appropriate sy < 0. Therefore, 262%" — —% (2 r) > 0, and we can 
—So 1—so —So 


show that lim Tr p®"T,, = 0. iss 


N— OO 


Proof of Han-Kobayashi bound (3.101) In the above proof, when r = lim,,_, —t 
log Tr 0®"T;,, we have 


sr — 9S) 


B*(r\pllo) = sup a : (3.141) 


S 


Since the quantity Hs| p\|o) satisfies the information processing inequality with 
respect to POVM as (3.22), similar to (3.141), we can show the inequality 


sr — 9S) 


B*(r\pllo) = sup a ' (3.142) 


—S 


As is shown in Exercise 3.54, we can show the opposite inequality in the classical case, 


in which, d(s) = ¢(s). We choose the POVM M on the tensor product space H®” 
such that the element of M is commutative with Km (p®”). Then, o(s [Prin | Pus) = 


P(S| Kom (0%) || 72”). We apply the classical result to the distributions Pipe and Pe. 
Then, we obtain 


1 —sr — 1A(8| Keen (p®") ||o®" 
B*(r|pllo) < —B*(mr|p®"||o®") < sup nee a (3.143) 
m 


s<0 l-s 
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Hence, by taking the limit, the above relation and (3.17) yield the inequality opposite 
to (3.142), which implies [39] 


o(s) 


B*(r|pllo) = sup — (3.144) 
s<0 1 —s 
The remaining argument concerning B*(r|p||o) is 
B*(ripli) <_ min D(rllp). (3.145) 
which is shown in Exercise 3.56. a 


Next, we prove information inequalities (3.17), (3.18), (3.20), and (3.22). For this 
purpose, we require two lemmas (Lemmas 3.9 and 3.10). 


Lemma 3.9 Let X be a Hermitian matrix in a d-dimensional space. The Hermitian 
matrix X®" given by the tensor product of X has at most (n + 1)4~! distinct eigen- 
values, i.e., |Exe| < (n+1)¢! 


Proof Let X = ae x'|u;)(u;|. Then, the eigenvalues of X®”" may be written as 
(x1)! +++ (xa) (n > jj = 0). The possible values of (j1,..., ja) are limited to at 
most (n + 1)¢~! values because d ; 1s decided from other values j,,..., ja—1- 


Lemma 3.10 ((40]) Fora PVM M, any positive matrix p and the pinching map km 
defined in (1.13) satisfy 
|M|ku(p) = p. (3.146) 


Proof We first show (3.146) for when pis a pure state. Let us consider the case where 
|M| = k, and its probability space is {1, ..., k}. Then, the Schwarz inequality yields 


k(v| ear (\u) (u/)|v) = (=) iano uat) 


a 


k 2 


=|2 (v|M;|u) 


I(v |Z] u)|? = (v|u) (ulv). 


|.\- 2 
Therefore, we obtain |M|kKyy(|u)(u|) => |u)(u|. Next, consider the case where p = 
ij p!|u;)(u;|. Then, 


\M\cu(p) — p=|M| nu | >) p/w s)(usl | — 2 p/w) (uel 
j 


j 


=p! (MI raa(\u) (uj) — [ej)(ujl) = 0, 
J 


from which we obtain (3.146). | 


130 3 Quantum Hypothesis Testing and Discrimination ... 


We are now ready to prove information inequalities (3.18) and (3.20). In what follows, 
these inequalities are proved only for densities p > 0 and o > 0. This is sufficient 
for the general ae el to the following reason. First, we apply these inequalities 
to two densities a= = (p + l)(1+de)~!ando, © (o t+eN(1+de)7!. Taking the 
limit € — 0, we obtain these inequalities in the general case. 

Proofs of (3.18) and (3.20) Inequality (3.18) follows from (3.20) by taking the limits 
lim,_. eslele) and lim,_, cual Fe 3 
Step ff Firstly, we show 


. Hence, we prove (3.20). 


P(slpllo) = O(s|Ko(p)|12). (3.147) 


Leto = yj o; E,,; be the spectral decomposition of o and E,, ;pEo,; = Do, px, j Ek, 

be that of E, ;pE,,;. Hence, k,(p) = Dei Px, j Ex,j. Since Ex | pE,,; = px,j Ex, j, it 

follows that Tr pa - = px,j. For 0 < s, applying Inequality (3.32) (the quantum 
Gj 


version of Jensen inequality) with the Hermitian matrix p and the density matrix 


Er, h 
TE. Eu ,» we nave 


Tr pts " < Tr pis Pei (3.148) 
PT: Be a ‘ Tr Ex, j , , 


Thus, 


Tr o°k(p)'* = Tr + op; Ex,j = > oj Tr Ex jPpj 
kj kj 


Ex l-s E, ; 
= 291 a ) ay 6 hE (Teo Fe) 
= 


kj 


SS = Tr p'- 
kj 


which implies (3.147). 
Step 2: Next, we show 


$(slKo(p)|lo) = (s|PM ||P) — (1 — 5) log |Eol. (3.149) 


For any aha M = {M;}, we define the POVMs M' = {M; , jad and M" = {M"} 
by M ae = = By jMi Ex; and M’ = = > M. ; Mj j¢ tespectively. Then, TroM; ;, = 
ao; Tr oe jM; E,,; and Tr pM; j= = px,j Tr Ex, ;Mj Ex, ;. Thus, 
Trp oe > troy = > pM, pre, 3) 
i, jk 
> DoT pM)! (Tro M7)’, 
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where the last inequality follows from the monotonicity in the classical case. In addi- 
tion, Tro M;/’ = ae, Tr oj Ex; Mi Ex; =TroM;, and Tr pM/' = pre Tr px, j Ek, j 
M;£E,,; = Tr &,(e)M;. Lemma 3.10 ensures that 


|E,|' (Tr ko (p)M;)'* > (Tr pM;)'~*, 


which implies (3.149). 
Step 3: Next, we consider the tensor product case. Applying (3.147) and (3.149) 
to the case with p®” and o®”, we have 


(s|p®"|o®") ages (p®")||o®") 
> o(s|PM en > ee )— (1 —s) log |E,en|. 


Since (s|p®"||o®") = nd(s|pllo) and (s|PM,"||PM") = ng(s|PM||PM), the 
inequality 


(S|Koen (p®")[|o®") 


log |Eye 
H(slpllo) > p > $(s[PM pM) — 8 Feo! 


log | Een | 
n 


holds. The convergence — 0 follows from Lemma 3.9. Thus, we obtain 


(S| Koen (p®")[|o°") 


P(sl|pllo) = lim 
noo n 


> $(s|Py IIPS). (3.150) 


f ~ en @n 
Note that the convergence limy_, weiner e Ne ig guaranteed by Lemma A.1 


n 
because (s|K a(ntm (PECt™) | cPOtM) > G(s|K gem (p2")||o2") + H(s|K gen (02) Io") — 
dlog|Egl- In addition, as is discussed in Exercise 5.22, the equality in d(s|p||o) > 
0(s|p||o) does not necessarily hold for s < —1. a 


Proofs of (3.17) and (3.22) By using (3.150), (3.17) implies (3.22). So, we show only 
(3.17). 

Step 1: We show (3.17) in the case with s < 0. We employ the notations given in 
the above proof. Similar to (3.148), applying Inequality G3. 32) (the quantum version 
of Je ensen inequality) with the Hermitian matrix a 2"-5 po 2-5 and the density matrix 
, we have 


l-s ts 
o. (Trp Ex. i =({Tr oR px j Ex. j a) 
! Tr Ex, ; Tr Ep 3 


_ 
Ex, j ; 5 s_\l-s Ex j 
Tr ohn 9) po Moy < Tr ( 20=s) por) do 
Tr Ex iy Tr Ex, 


Ex, 
Tr Ey; E, 
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Thus, 


E,; 1—s 
Tr ag tS li = Str Ex, j0; (t kj ) 


kj Tr Ex, j 
s s_\l-s Ey ; 
<)>) Tr Ey; Tr (o7"5 pox) ae 
kj ‘s 


e ; 1-s ; l-s 
= > Tr (< m3) po at) Ex, j = Tr (or po xt) ; 


which implies 
Hs|plic) = O(slko(p)llo). (3.151) 


Conversely, using (3.146), we have 


oR po < GA5|E,|k,(p)ou, (3.152) 


Thus, Lemma A.13 yields that 


l-s 


Tr (< 3) po xt) 


which implies 


< Tr (o®5|Eg|ko(p)o7™ ) 


s|Ko(P)llo) + 1 — s)log |E,| = O(slpllo). (3.153) 


By considering the tensor product case, (3.151) and (3.153) imply 


H(sloo (O™)|lo2") | (1 —s) log /E3"| 
n n 


> &slplio) 


‘ P(S| Koon (oP Ilo) 


n 


Taking the limit n +> 00, we obtain 


a en Qn @n 
3(s|pllo) = lim, PGlKeo (oe) (3.154) 


n 


which implies (3.17) for s < 0. Using (3.150), we obtain (3.22) for s < 0. 

Step 2: Next, we show (3.17) in the case with s € (0, 1). We notice that (3.152) 
holds even for s € (0,1). Now, we choose the basis {|e;)} such that K,(p) = 
>; p' lei) (e:| and «,(\e:) (ei!) = lei) (e;|. Hence, since x +> x!~* is concave, we have 


ePrsOl) — Tega port=a)'* = ¥" (e;| (079 powT)'*|e;) 


i 


<>) lei lo® po |e;))' 
i 
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= > (e077 |e;) (e;|ple:) (e;|o 7 |e;))' 


t 


=Tr(o 29 kg (po HH )I-§ = eSDi-so(Mllo) (3.155) 


Since x +> —x~* is matrix monotone, we have 


e SPi-s(Pllo) — Tr(a rca) po 2-5 \(o 5 po 7-5 ) 5 


> Tr(a 209 po (1-9 |Eg |g (p)o2-9 


=|E,| Se PP) (3.156) 
Combining (3.155) and (3.156), we have 
D,_,(Ko(p)|lo) + log |E,| = Dy_,(pllo) = Di_,(Ko(p)llo). 


Considering the tensor product case, we obtain 


1 1 
7 Pi-s (Kae (p®") ||") + ~ log |E2"| > D,_,(pllo) 
1 

=~ Dy_,(Koo (p®")|1o®"). 


Taking the limit n +> 00, we obtain 


. ol 1 
D,_,(pllo) = lim 7 Dies (koe (|e) = Jim 7 Di-s (Kos (p®")||o®"), 


which implies (3.17) for s € (0, 1). a 
Exercises 


From Exercise 3.49 to Exercise 3.55, we consider only the classical case with p and p. 
So, we abbreviate ¢(s|p||p) by ¢(s) (The results of these exercises are summarized 
as Table 3.1). 


3.49 Show that 


jim | —$'(s) = Dmax(PIIP)- (3.157) 


Table 3.1 Behaviors of ¢(s), ¢’(s), s¢’(s)—@(s), and (s — 1)d'(s)—(s) 


o(s) +00 \ 0 u 0 

¢'(s) —Dyax(P\lP) | 7 —D(p\|p) 7 D(p\lp) 
sd'(s) — d(s) | —P(p\lp) NX 0 a D(p\lp) 
(s — 1)¢'(s)— | —P(pllp) NX D(p\|P) NX 0 

o(s) 
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3.50 Define P(p||p) := Dixslog p(x)—log p(x) PO) and P(p\|p) := Dixslog p(x)—log p(x) P(x). 
Show that 


lim  D(ps||p) = lim (8 — 1)4'(8) — 065) 


= — log P(p||P) = Dimax(p||p) — log P(p\|p), (3.158) 
im D(psllp) = Jim s¢'(s) — o(s) = — log P(p||p). (3.159) 


3.51 Show that 


ds, 1 (3.160) 
dr (s, — 1)b"(s,) 
3.52 Show that 
a i D@Ipi=. me Weise = Delp) 
sup ———_ = min — min r— 
= l-s q:Dq\lp)2r ae q:D(q\lp)<r QP qP 
(3.161) 


forr € [D(p||p), — log P(p|lp)]. 
(a) Show that there uniquely exists s, < Osuch that D(p,, ||p1) = rforr € [D(p||p), 


—log P(p\|p)). 
(b) Show that 


min D(q||p) = D(p,, |p). (3.162) 
q:D(q\lp)=r 


(c) Show that 


min D(q|lp)+r—Dql|lp)= min Dq|lp)=D(p,|lp).  (.163) 


q:D(ql|lp)sr q:D(q\|p)z2r 

(d) Show that 

D(ps,\|p) = — (3.164) 

=a 
(e) Show that 
d —sr—(s) —r+(s—1)¢'(s) — 46) 

ds 1l-s (1 —s)2 ; (27109) 
(f) Show that 

a = PCIPIP) _ =r — P(Sr) (3.166) 


s<0 l-s 1-5, 
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(g) Show (3.161). 


3.53 Show that 


—sr — $(s|pllp) ; 5 
sup——_ = min D(qllp)+r-—D(qg\|lp) 
<0 —Ss q:D(q\lp)sr 
=min D(q\|p) +r ~ Dll) =r ~ Davax(PIP)- ley) 


forr > —log P(p||p). 
(a) Show that 


—sr — 6(s|pll) _ —sr — 6s|plB) _ 
sup ————_————— = __ sup = 


s<0 a) s—>—00 l-s 


r— Dmax(p||P). (3.168) 


(b) Show that 
_ D(q\lp) +r — D(q\lp) =r — Dmax(P|lP)- (3.169) 
(c) Show that 


min D(q|lp)+r—D@\lP) <r — Dmax(P|lp). (3.170) 
q:D(q\lp)sr 


3.54 Show the inequality opposite to (3.141) following the steps below. Therefore, 
we obtain the equation 


‘ = —sr — $(s|pllp) 
B*(r|p\lp) = sup GAT1) 


(a) Show the following equations for 4(s) = $(s|p||p) using Cramér’s theorem. 


lim —~ log p" {-- log (=o) < r me (3.172) 
n>oo Nn n p"(x") s<0 
lim —~ log 5" |-7 102 (2°) Z n| = max —(1—5)R—4(s). G.173) 
n>o Nn n p"(x") s<l 
(b) Show that 
s¢'(s) — d(s) = max(so9'(s) — (So)) for s < 0, (3.174) 


(s — 1)¢(s) — o(s) = max((so — 1)¢'(s) — $(s0)) for s < 1. (3.175) 
(c) Show that 


r = max(s — 1)9'(s-) — 9(s), (3.176) 
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eee) = 56'(Sr) — Pr) = max so'(s,) — d(s). (3.177) 


(d) Assume that D(p||p) <r < —log P(p||p). Show the inequality B*(r|p||p) < 
sup“ lPIIP) py using (3.166), (3.176), and (3.177). 


s<0 


(e) Assume that r > — log P(p|| p). Show the inequality B*(r|p||p) < sup ale 
s<0 
by using (3.167). 


3.55 Show that 


aB*(riplp) LP B*rIpllp) _ ae a 
dr s,—1 dr? (s, — 1)36(s,) ~ 
which implies the convexities of B*(r|p||p). 
Now, we proceed to the quantum case with p and o. 
3.56 Show the following inequality by following the steps below. 
B'(riplia) <_ inf Drip) = _ min Drip) (3.179) 
@n 


(a) For any state 7, show that there exists a sequence {7,,} such that lim,..5 Tr 7 
(I — T,) = 1 and lim,_,.. —+ log Tr 0®"T, = r. 

(b) Show that the above sequence {7,,} satisfies limp. —1 log Tr p®"T, < 
D(t\Ip). 

(c) Show (3.179). 


3.57 Let a sequence of tests {T,,} satisfy R = lim —- 1 log Tr p®"T, andr < lim — 1 
log Tr c®" (I — T,). Show that R < D(t||p) when D(r\lo) <r using Lemma 3. 7 
twice. That is, show that 


B < inf D = in D 
(Clelo) = int, (Tlp) pee (Tllp) 


3.58 Show As|pllo) = = limyoo + ; max (5 |P Yon |P%,) for s < 0. 


3.59 To prove (3.16), show that lims_, 5 #Cslelle) = Dmax(p||o) following the steps 
below. 

(a) Show that Dax (plo) > Dmax(Ko(P)IIo)- 

(b) Show that limy—o0 1 Dinax (Kg (p®")||a®") = Dax (plo). 


(c) Show that lim, ,. 2! = Drax (pllo). 


3.60 Assume that a rank-one PVM M = {M;} is not commutative with p and is 
commutative with a. Show that 
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2 2 M |; pM 
P(5|Ko22(p**)||o®") > 2G(s|PoIIP,) (3.180) 
for s < 0 by using Exercise 1.35. 


3.61 Assume that a rank-one PVM M = {M;} is commutative with p and is not 
commutative with p. Show that (3.180) for s < 0. 


3.62 Assume that o is not commutative with p. Show that 
H(s\pllo) > o(s|PM IPM) (3.181) 


for s < 0 and any POVM M following the steps below. Therefore, there exists a 
POVM such that A(s|pllo) a o(s|P! |p” ) if and only if o is commutative with p. 
(a) Show that it is sufficient to show (3.181) for a rank-one PVM M. (Hint: Use 
Theorem 4.5 given in Sect. 4.7.) 

(b) Show (3.181) in the above case by using Exercises 3.60 and 3.61. 


3.9 Proof of Theorem 3.1 


In this proof, we only consider the case in which there exists an element x € L such 

that A(x) = b.° Otherwise, since both sides are equal to —oo, the theorem holds. 
When x € L satisfies A(x) = b and y satisfies A*(y) —c € L*, we have 0 < 

(A*(y) —c, x) = (y, AQ)) — (c, x) = (y, b) — (c, x). Hence, we can check that 


max{{c, x)|xEL,A(x)=bh < min{{y, b)|A*(y) —c € L*}. (3.182) 
xEVy yev2 
Furthermore, 


min{(y, b)|A*(y) —c € L*} 
yeVvy 


= min {play € V2, Vx € L, (y,b) — (A*(y) —, x) < ph. 
(u,y)ERx V2 


This equation can be checked as follows. When y € V> satisfies A*(y) — c € L*, 
the real number pu = (y, b) satisfies the condition on the right-hand side (RHS). 
Hence, we obtain the > part. Next, we consider a pair (uu, y) satisfying the condition 
on the RHS. Then, we can show that (A*(y) —c, x) is greater than zero for all 
x € L, by reduction to absurdity. Assume that there exists an element x € L such that 
(A*(y) — c, x) is negative. By choosing a sufficiently large number t > 0, tx € L, 
but (y, b) — (A*(y) — c, tx) < ys does not hold. It is a contradiction. This proves the 
< part. 


5Our proof follows [19]. 
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Let 7 = maxyey,{(c, x)|x € L, A(x) = b}. Then (79, 0) is a point that lies on the 
boundary of the convex set {((c, x), A(x) — b)}xez C R x V2. Choosing an appro- 
priate yo € V2 and noting that (1, —yg) € R x V2, we have 


no = No — (y, 0) = (c, x) — (yo, Ax) — b), Vx EL. 
From this fact we have 


No >= min {play € V2, Vx € L, (y, b) — (A*(y) —c, x) < ph. 
(u,y)ERx V2 


This proves the reverse inequality of (3.182) and completes the proof. 


3.10 Historical Note 


The Rényi relative entropy Dj+;5(p||o) was introduced by Petz [41] as quantum f- 
divergence. Recently, another kind of Rényi relative entropy D,,,(pl|o) was intro- 
duced by the papers [4, 5] to connect the fidelity F'(, 7) and the max relative entropy 
Dymax(p||7), Which was introduced by the paper [3]. Based on advanced knowledge, 
i.e., matrix convex functions (See Section A.4), Petz [41] showed the monotonic- 
ity for Diis(pllo). A different paper [6] showed that for D,,,(pllo) by using a 
more difficult method. In this text, we prove the monotonicity of the relative Rényi 
entropies D,,;(p||o) (s => 0) and D,. ,(p||o) (s = 0) for a measurement based only 
on elementary knowledge. 

The problem of discriminating two states was treated by Holevo [11] and Hel- 
strom [12]. Its extension to multiple states was discussed by Yuen et al. [17]. If we 
allowed any POVM, the possibility of perfect discrimination is trivial. That is, it is 
possible only when the hypothesis states are orthogonal to each other. However, if 
our measurement is restricted to LOCC, its possibility is not trivial. This problem 
is called local discrimination and has been studied by many researchers recently 
[42-50]. 

On the nonperfect discrimination, Chernoff’s lemma is essential in the asymptotic 
setting with two commutative states. However, no results were obtained concerning 
the quantum case of Chernoff’s lemma. Hence, Theorem 3.5 is the first attempt to 
obtain its quantum extension. Regarding the quantum case of Stein’s lemma, many 
results were obtained, the first by Hiai and Petz [32]. They proved that B(p||o) = 
D(p\||c). The part B(p||o) < D(p||c) essentially follows from the same discussion 
as (3.123). They proved the other part B(p||c) > D(p||o) by showing the existence 
of the POVMs {M”} such that 


l+s 


te ys 
lim —D™" (p®"||0®") = D(plla). (3.183) 
n>o n 


An impetus for this work was the first meeting between Hiai and Nagaoka in 1990 
when they were at the same university (but in different departments). During their dis- 
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cussion, Nagaoka asked about the possibility of extending Stein’s lemma to the quan- 
tum case. After their achievement, Hayashi [51] proved that there exists a sequence 
of POVMs {M”} that satisfies (3.97) and depends only on oc. Hayashi [40] also 
proved that the asymptotically optimal condition for a measurement in terms of 
quantum hypothesis testing depends only on o. Moreover, Ogawa and Hayashi [52] 
also derived a lower bound of the exponent of the second error probability. After 
the first edition of this book, two big breakthroughs have been done by Audenaert 
et al. [13], Nussbaum and Szkota [14]. Audenaert et al. [13] showed very helpful 
evaluation as Lemma 3.3. Although the original their proof is rather complicated, 
Narutaka Ozawa [53] gave its much simpler proof, which is presented in this book. 
On the other hand, Nussbaum and Szkota [14] introduced simultaneous distributions 
Pipi) and O(po) for two non-commutative density matrices p and a. Then, they 
derived a lower bound of error probability. However, this kind of distribution was 
essentially discussed in Hayashi [40] by considering the pinched state K,(p). 

Regarding the strong converse part B'(p||7) < D(p||o), Ogawa and Nagaoka [7] 
proved it by deriving the lower bound of the exponent sup_<,<9 sents , which is 
equal to the RHS of (3.141) when s < 0 is replaced by —1 < s < 0. Its behavior 
is slightly worse for a large value r. After this, the same exponent was obtained by 
Nagaoka [54] in a more simple way. However, these two approaches are based on 
the monotonicity of the relative Rényi entropy ¢(s|p||o) (—1 < s < 0). In this text, 
we apply this monotonicity to Nagaoka’s proof. Hence, we derive the better bound 
SUP, <¢ Ss) | which was derived by Hayashi [55] using a different method. In 
addition, the second inequality in (3.100) was first proved in the first version of this 
book. Further, the first version of this book showed that 


—sr — limp soo +6(| Koen (p®") ||o®") 
l-—s 


B*(r|pllo) = sup (3.184) 


by showing the monotonicity for the information quantity limyoo 14(s|Kq (p®")Ilo®"). 
Recently, Mosonyi and Ogawa [39] showed (3.144) by showing the relation (3.154). 
Furthermore, Nagaoka invented a quantum version of the information spectrum 
method, and Nagaoka and Hayashi [34] applied it to the simple hypothesis testing 
of a general sequence of quantum states. 

Finally, we should remark that the formulation of hypothesis testing is based on 
industrial demands. In particular, in order to guarantee product quality, we usually 
use test based on random sampling and statistically evaluate the quality. It is natural 
to apply this method to check the quality of produced maximally entangled states 
because maximally entangled states are used as resources of quantum information 
processing. Tsuda et al. formulated this problem using statistical hypothesis testing 
[56] and demonstrated its usefulness by applying it to maximally entangled states 
that produced spontaneous parametric down conversion [57]. Further, Hayashi [58] 
analyzed this problem more extensively from a theoretical viewpoint. However, con- 
cerning quantum hypothesis testing, the research on the applied side is not sufficient. 
Hence, such a study is strongly desired. 
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3.11 Solutions of Exercises 


Exercise 3.1 These can be shown by simple calculations of D(p||a) and D,+;(p||o) 
Exercise 3.2 
A(pa ® pp) = —Tr pa ® pp log(pa ® ps) 


= — Tr pa ® pp log(p, ® Ig) + log, ® pp) = — Tr pa log pa — Tr pp log pp 
=H (pa) + H(pp). 


(3.27) can be shown by a similar calculation. 


ePi-s(Pa@psllaa®on) Tr(p4 @ pay “toa @ op) 
=p, Op, ie) Oe4) = 1p, op, es) 


=eP1-s(Palloa) pD(pallo) 


(3.29) can be shown by a similar calculation. 


e~ Pews(ea pale ®e0) — (a4 @ op) ?(pa ® pa)(oa @ on)? Il 


tl tl et —t 
=l(0,° @ 7°) (pa ® pa)(O,” @ FR )Ih 
1 L 1 1 
=||(0 4° Par,” ® Tp PBIB | =e 


—Drmax (Pa loa) 2—D(pslies) 


e7 Prin (Papa lle, @on) = Tr(o4 ® og){pa ® pp > O} 


=Tr(o4 @ oB){pa > 0} @ {pp > 0} = Troa{pa > O} Tros{pp > O} 
=p Prmin(Palloa) p—D(pallon) 

Exercise 3.3 Consider the spectral decomposition M of X and apply Jensen’s 
inequality to py ; 


Exercise 3.4 g(O|plio) = £ Trp! o°|s-0/ Trp! oso. 4 Trp!" |s<90 = — Trp! 
log po*|s=0 + Tr p!~%o° loga|s;-9 = —Trploge+Trplogo. The other inequality can be 
shown in the same way. 


Exercise 3.5 


(a) d(s|pllo) = - Tr p!-8o°/Tr p!%o°. oe Trp! 0° = —Trp!~ log po* + Tr p!o° logo = 
—Trlog pp!~°o* + Tr p!~So° logo = Tr p!~So° (— log p + logo). 
d) Use Schwarz’s inequality with respect to the inner product Tr XY* with two 
q y p p 


vectors p"-)/? (log a — log p)o*/? and pO-9)/2¢5/2, 
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Exercise 3.6 2019 = — Tr(a po NH)!“ Jog(a = po) + Tr(l — 5) - 


(a X=s) po X=) yy ‘ aay . log oO: (a X=) po = ). Hence, dotslolie) | = Tr p 


log p+ Tr plogo = —D(pllo). 


Exercise 3.7 limy_,.0 D,,,(plla) = limy5o0 1 ; log Tr(o = po my its = lims 569 


log jo po®5 || = log |lo~? po~? I. 


Exercise 3.8 For s € (0, 1), applying Araki-Lieb-Thirring inequality to the case 
r = 1-—s, we have 


eS Pi-s(ello) — Tr(a 2-5 po 20-5 )!-8 > Treo 5o3 = Tro a 5 —SDi-s(pllo) | 

Fors < 0, applying Araki-Lieb-Thirring inequality to the caser = 1 — s, we have 

e SPi-sell) — Tr(a 2-5 po 20-5 )!-8 < Tr oi p!o2 = Tro‘ p) Se? '- s(pllo) 
Exercise 3.9 


(b) 


1 
D(ollp) — Tas Plas) 


= Tr o(log o — log p) — ——Trologe + Tralog p+ rele) 

ar hee _ Hic ae 

=t m+ = a= sy oge- vin + 20 
l-s 1- — 


=—sTrp,logp + w(s|p) = Dioulo). 


(c) The desired inequality follows from the inequality Dol ps) = Ofors < 1. 
Exercise 3.10 


(a) It follows from Exercise 3.9. 
(b) It follows from (a) and the continuity of H(a) and D(o||p). 
(c) (3.33) follows from (b) and similar relations as (2.62)—(2.64). 


Exercise 3.11 


(a) It follows from Exercise 3.9. 
(b) It follows from (a) and the continuity of H(a) and D(o||p). 
(c) (3.33) follows from (b) and similar relations as (2.68) and (2.69). 


Exercise 3.12 Since pio tsp? is unitary equivalent with a 20) po 2055 , So, we 
have the first expression. The next, expression follows from the definition of the 
p-norm of matrices given in (A.24). The final expression also follows from the 
definition of the p-norm and the original definition of D,,, (plo). 
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Exercise 3.13 We can show (3.41) as follows. 

min{x|p < xo} = min{x|o~2pa~2 <x}= \lo~2 po? l|. 
Exercise 3.14 These can be shown by simple calculations of b(p, 7) and d;(p, c). 


Exercise 3.15 


(a) Schwartz inequality implies that /Tr X X*/Tr YY* > Tr XY* and Tr X X* 
V/Tr YY* > Tr YX*. Hence, 


(/ Te XX* +4/TtrYY*Y = Te XX* 4+ TeV Y* + 20/Tr XX Tr ¥Y* 
>Tr XX* + TrYY* + Tr XY* + Tr YX* = Tr(X — Y)(X — Y)*. 


(b) Substitute (,/p1—./p3U2 ./p3U2 — ./p2U, into X and Y. Since 
VTr(/paU2— VP2U') (VRUS — PRU)” = Tr(Ys— VU) (Va RU) 


we have the desired inequality. 
(c) Choose U; and U2 such that b(p;, p3) = [Tr(vpi- yeaU2) (Vpi— V/p3U2)" and 
b(ps. pr) = /Tr(Vas— VraUU3) (Ys p23). Then, = (pi, mn) < 


{tr (/pi — /p2 U) (/pi — J/p2 U) . Combining the inequality and (b), we obtain 
the desired inequality. 


Exercise 3.16 Note that 2(x? + y*) > (w+ y)’. 
Exercise 3.17 


(b) Take the average by integrating between [0, 27] for each 6;. Note that (u|v) is 
continuous for each 6;. 


Exercise 3.18 Choose the orthogonal basis {u,, v2, ...} such that u = uv; and v = 
xu, + yu2 with x, y > 0. Then, j USINE the matrix representation under the basis, we 
— x? — 2 
have |u))u| — |v))v| = ‘e x 2)= ( ” ?). Due to Exercise A.3, its 
xy y —xy y 
trace norm is 2,/y* + (xy)? = 2y,/y? + x2 = 2y = 2/1 — |(ulv)/?. 


Exercise 3.19 


(a) Use the Schwarz inequality. 
(c) Choose U such that |p!/2o!/?| = Up!/2o'/?. Note that | Tr Up!/?Mja!/?| > 
TrUp!/? Mjo!”. 


Exercise 3.20 


(a) Make the polar care ae A=UJ|A|. Hence, ||AA‘|| = ||U|A||A|UT|| = 
ANATI] = ]U|AIU" het ioe = ||A‘All. 
(b) Tr p2o! = Tr pp?2a7!p? < ||p207!p?|| = |lo~2 po? |. 


(c) This fact can be shown by using Exercise 3.5. 
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(d) Using (b), we can show that D2(p||7) < Dmax(p||o). Combing (c), we can show 
(3.12). 


Exercise 3.21 Note that the spectral decomposition >); AM; of p'/?U*a—'/? satisfies 
Mi? ot/2 => M} p)/2U*, 


Exercise 3.22 
(b) See the hint for Exercise 3.21. 
Exercise 3.23 Assume that p — 0 = >; x; M;, where {M;} is a PVM. Then, Tr(p — 
o)M; = x; Tr M;. Hence, 
2di(p, 0) = Ilo — oll, = Te >) xe Mi 
=|Tr pM; — TroM;| = d\(P”, P™). 


Exercise 3.24 Let M be a POVM that satisfies the equality in (3.46). Applying 
(2.25) to Py and PY, we obtain d; ef p”) > b’ ey, Pi, Finally, adding (3.47), 
we obtain d;(p, 7) = b’(p, 0G). 


Exercise 3.25 It can be shown by the similar way as Exercise 3.24. 
Exercise 3.26 It can be shown by the similar way as Exercise 3.24. 
Exercise 3.27 — log Tr |,/p./o| = — log 1 — b?(p, 0) = b°(p, 2). 


Exercise 3.28 Since 0 < x < 1, wehave 1 — x < 1 — x? < 2(1 — x), which implies 
(3.51). 


Exercise 3.29 Due to (3.59), we have 


1 
1 — 541 (6(p), «(o)) = min (Ir a(o) — TP) + Tr(o)F) 


= min, (Tr pd — «*(T)) + Tr on*(T)) 


1 
2 (Tr pU —T)+ TroT) =1— Zale, c), 


which implies (5.51). 
Exercise 3.30 


(a) Define the function g(y) as 


9 x 1-x 
g(y) = A, y) — 2(y — x) anes G mE as 2(y — x)". 


Then, 
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48 6) a a A i 
an:ClUme Cy 
yd —x)—x(1—y) y- 
= 4 = 4 
ae ve = da9, 

1 1—4y(1— y) (2y — 1)? 
=(y — ——— - 4) = = : 
eg gy a) 


Thus, g(y) takes the minimum 0 at y = x. 
(c) Choose a two-valued POVM M = {P, I — P}, where P is given in (b). Then, 


D(pllo) = D (PM ||P!) > 2d (PM, PM) = 2d? (0, 0). 


Exercise 3.31 


D(pllo) = Tr >) ai|ui)(uil | log >) ailui) (wil — log D2 bjlv,) (vy 
i i J 


i J 


=Tr >* a;|u;)(uil | > Cog .a;)|u;) (wil — > dog b;)|v;)(vj| 


= > qj loga; — >° a; log bj (vj|ui) (uilv;) 


ij 


= >) a; (log a; — log b;)|(u;|v;) 7 
i,j 


- > ail (uilvj) I? dog a;|(w;|v;)|? — log b;| (u;|v;)|*) 
ij 
=D(Poj0yll Q(oi0))- 


Exercise 3.32 Note that Tr |X| is equal to the sum of the absolute values of the 
eigenvalues of X. Using this fact, we can show (3.60). 


Exercise 3.33 Since Tr(p — 7) = 0, Substituting (9 — o) into X of (3.60), we obtain 


Tr p{fo —o <0}+Tro{p—a > 0} 
=Tr p{p —o < 0} + Tro — {p—o < 0}) 


1 
=1+ i ae ae < 0} 


i 1 
=1 (Gm a)U — 2{p 7 <0)) =1~ Fp oh. 


Exercise 3.34 || pmix — pll1 = 2 Tr(pmix — p){pmix — p = 0}. Hence, 2 — ||pmix — pll) = 2 — 2Tr 
(Pmix — P){Pmix — p = 0} < 2— 2Tr pmix{Pmix — p = 0} = 2Tr pmix{Pmix — p < O} < 2Tr pmix{0 
<p} = 22. 
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Exercise 3.35 

(a) Since (./A — /B){/A < /B} < 0, we have Tr /A(J/A — /B){/A < JB} 
< 0, which implies (3.81). Similarly, we obtain (3.82). 

(b) Summing the inequalities (3.81) and (3.82), we obtain (3.80). 

Exercise 3. 36D = ETE Mi < <>, 1M =1 (MS lat 4 Bur- 
ther, Li Lk es < gllMilli lle = ¢ maxy orl Dy Mills = tse 
lel] Dj) Tr M; = ¢ max; |Ipill- 


Exercise 3.37 Use Cramér’s theorem with X = —log , 06=s, x =0, and 


show that limy—+ oo -t log p” {-2 log (2) > o| = max,>9 —(s) and limy_.o6 
-t log p" {-2 log (2) < R| = max,<; —(s). 


Exercise 3.38 


(a) 6(s) = log |u)(u|}—*o* = log(ulo*|u). Since o* > o* fors € (0, 1), wehave (ulo*|u) > 
(u|o|u). Hence, we have inf |. ,.9 @(s) = infj.,.9(ulo*|u) = (ulo|u). 
(b) 
in (Tr p®’" UV —T)+Tro®’"T 
Se ee 


< (Tr |e) (u|@" T= |) (u|®") + Tr o®” |x) (u|®”) 
=Tro®"|u) (u|®" = (ulo|u)" = exp(n inf 9(s)). 


Exercise 3.39 When the POVM {|u)(u|, J — |) (u|} is applied, the outcome obeys the 
distribution Pp or P}, where Po(O) := 1, Po(1) := 0, P, (0) := (ulo|u), and P; (1) := 
1—(ulo|u). Then, (s|Po|| Pi) = log(ulo|u)*’ = s log(u|o|u). Thus, infj.,.9¢ 
(s| Po|| Pi) = inf j.s;.0 s log(u|o|u) = O(s| Poll Pi) because G(s| Poll Pi) < 0. 


Exercise 3.40 
(a) Exercise 3.18 and (3.59) implies that 


Fy @n = @n 
nin, (Tr Ju) (u|®" I — T) + Tr |v) (v|®"7) 
1 1 i 
=1- 5 lla) (ul® — |v) (v|P" 1 = 1 — V1 = | (uv) 2" 
(b) 
1 . ol 1 9 

lim — oes —VJ1—|{ulv)|/2"7 = lim —log1— (1 — =|(u|v)|*") 
00 n>oon 2 


1 
= lim ~ log +5 luv) ) = = log |(u|v) |? 


(c) Since inf }.,.9 O(s) = inf j.550 log | (u|v) |? = log | (u|v)|?, we have (3.87). 
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Exercise 3.41 Since U* = U, we have Tr p! ‘0° = Trp! SUp’U* = Tr Up! *U* 
p’ = Tro! *p*, which implies the symmetry ¢(s) = ¢(1 — s). The conclusion can 
be derived pds the convexity of @(s) and the symmetry. 


Exercise 3.42 We choose s; as the solution of 6(s) = 2s log F(p, 7). We also choose 
S2 as the solution of ¢(s) = 2(1 — s) log F(p, a). Assume that a convex function f on 
(0, 1) satisfies d(s) < f(s) < Oandlog F(p, 0) < f (1/2). Considering the graph of 
f(s), we find that f(s) > #(sz) for s € [0, 1/2] and f(s) > (s;) for s € [1/2, 1]. 

Remember that the assumption guarantees the symmetry ¢(s) = @(1 — s). Since 
o" (1/2) > 0 and 6(1/2|pllc) < log F(p, 0), we have $(s2) > #(1/2) and (51) 
> o(1/2). Further, for any POVM M, o(s|PM ||P“) satisfies the condition for f(s). 
Thus, miny inf\>5>0 O(s|PM||PM) > min((s2), p(s) > o(1/2). 

Similarly, for any POVM M”, 1 6(s|PM,, 8" |P.) satisfies the condition for 
f(s). Hence, + ming inf>5>0 $(s|P yn l|Pysn) > min(G(s2), d(s1)) > oC1/2). 
Therefore, ita al ming” inf1>s>0 6(s|P% M’ (PAt,.) = min(o(s2), o(s1)) > (1/2). 


Exercise 3.43 


(b) 
1 
D(qllp) — -— Pals) 


1 
= D746) log q(x) — log p(x) — -— Diadosats) — log p(x)) 


+ Lat dog rts) — los p00) - a) 
=-— = INRA) — log p(x)) — _o 
ee ee ee #6) 
=~ 5 D(q\|b) — = -—Dipull) - 
S = = 
== TDi ps1 ~ 8) log p(x) +s log B(x) — log p(x)) 
s os) 
* 1- oo l-s 


=—s >) ps(x) (log p(x) — log p(x)) — o(s) 


= >) ps(x)log(p(x)! p(x)’) = log p(x)) — (8) = D(ps|lp)- 


(c) The desired inequality follows from the inequality ~~ D(q lps) = Ofors <1. 
Exercise 3.44 


(b) Notice that the set of all distributions forms an exponential family by adding 
generators {g9;};+2. Here, we choose g;(x) to be log p(x) — log p(x). D(q||p) can 
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be regarded as a Bregmann divergence. Next, we apply Theorem 2.3 to the following 
case: M is the mixture subfamily containing p; with the generator g;. € is the 
exponential subfamily {p,(x)}, which is generated by g;. Then, M contains p, and 
q. Choosing the parameters 0, 6’, and 6* to indicate the distributions p, g, and ps, 
respectively, we obtain the desired argument. 


Exercise 3.45 


(a) Use 2 < D(p||[p). 
(b) Substitute q = p ands = O in the right and left hand sides, respectively. 
(©) (s — Io's) — d(s) = (8 — DD, psx) og p(x) — log p(x) — 68) =D, 
ps(x)(1 — s) log p(x) +5 log p(x) — 6(s) — log p(x) = D(ps|lp1). 

so'(s) — 6s) = 5D, psx) (log p(x) — log p(x) — d(s) = Dd, ps — 5) 
log p(x) + s log p(x) — o(s) — log p(x) = D(p Po). 
(e) The map s +> D(ps||p) is continuous and monotonically decreasing in the 
domain [0, 1]. It has the range [0, D(p||p)]. 
(f) It follows from Exercise 3.43. 
(g) £D(p.\p) = £56(s) — o(s) = 56"(s) > 0. Thus, the map s +> D(p,||p) is 
monotonically increasing. the map r +> s, is monotonically decreasing. So, the map 
r +> D(py,||p) 1s also monotonically decreasing. Thus, (3.114) follows from Exer- 


cise (f). 
(h) =8-t= (Sr) __ 5 DP IPI) =O) _ = Sr ((8= 1) $'(8) = OS) = G(Sr) __ = s,¢'(s,) — (s,) = 


1-s, 1-s, 1—s,. 

D(Ps, ||P). . 
(j) HE pe@ es) = 0, Since £(—r + (s — 1)d'(s) — G(s) = (8 — D4") 
Spe Dee) Gs) =rt+(s~1)o'(s)=9(s) 
—— a a 


< 0 for s > s, and > 0 for s < s,. Hence, 


the maximum of =— a 


(k) Equation seins follows from the combination of (g), (h), and (j). 


is realized only when s = s,. 


1—p00 1-q00 
Exercise 3.46 Choose p = 0 44 )ando= 0 $4 |.ThePVMM = 
0 22 o ¢¢ 
2-2 22 
100 000 000 
{M;} is givenasM, = | 000],M,={010 ] M3; = {000 }. Although M, 
000 000 001 
commutes panda, M> and M; do not commute p and o.. However, choosing a; = = 
l—p 
2 00 
and a. = a3 = a we have > a;M; = 0 A 0 |, which satisfies Condition 
0 02 


q 
(3.120). 


Exercise 3.47 
p (pi | p's.) 


= = (1 Tr My... P °) log (1 Tr Mz, °) — log (I Tr My, -) 


k=1 k=1 
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= => (I Tr Mi. OD: (log Tr M;.,,,e — log Tr MZ. a) 
=1 


Wh k=1 


= > > Tr dk.w, Mi..., p (log Tr aku, Mi, p — log Tr dku, Mi., a) 
Wy k=l 


n 
= > > Tr dx, Mi.,, P (log Tr ak,, Mey, P — log Tr aku, Meu, a) 


k=l Wn 


k=1 


ee) 


Exercise 3.48 The desired equation follows from R,, = — ae (so). 


Exercise 3.49 —/(s) = >, ps(x)(log p(x) — log p(x)) = Dmax(p|| >) + 0(1) as 
Sa -OC. 


Exercise 3.50 From the derivation of Exercise 3.50, we find that —¢'(s) = Dmax(p|| >) + 
o(1/s) ass + —oo. 

Since Bey Pah — o(1) as s > —oo, we have ¢(s) = log P(p|| p) — sDimax(p|| >) + 
log >, — POY PS) — Jog P(pl| p) — sDmax(p||p) + 01). Thus, (3.109) implies that 


(pllp) e5 Dmax (PIP) ae 
D(ps\|P) = (s — 1)¢'(s) — $(s) = —(8 = 1) Drax (pl P) + OC) — log P(pllp) + 
5 Dax (P| P) + 01) = Dmnax(p|| P) — log P(p||p) + o(1) as s > —o0. 

Similarly, (3.110) implies that D(p,||p) = s4'(s) — $(s) = —8 Dinax(PI|P) + 
o(1) — P(p||p) + 5 Dmax(pl| P) + oC) = — log P(p||p) + o(1) as s > —on. 
Exercise 3.51 Itfollows from 1 = 4 = £D(p,_\|p1) = £(s, — Dd(s,) — O(5,) = 
a 4 (5 — 1)h/(s) — O(s)lsas, = is =(s, — 1)"(s,). 


Exercise 3.52 


(a) The map s+ D(p,||p) is continuous and monotonically decreasing in the 
domain (—oo, 0]. It has the range [D(p||p), — log P(p||p)). 

(b) It follows from Exercise 3.43. 

(c) £D(psllp) = 4s¢'(s) — G(s) = sd"(s) < 0. Thus, the map s + D(p,||p) is 
monotonically decreasing. Since the map r +> s, is also monotonically decreasing, 
the mapr +> D(p,, ||p) is monotonically increasing. Thus, ming: pig p)>- D(q||p) = 
D (pz, || p) follows from (3.162). 

The relations (3.110), (3.160), and (3.162) imply that ming: pigj y= D(g|lp) + 
r— Dp) =r +S 9'(s-) — bsp) — (Sp — 1G (5) — O65) = 7 + 6'(S). 
Thus, 4 ming: piqypy=r D(g\ip) +r — DIP) = Sr t 6 (x) = 6" (Sy = 
b"(s,) =e = a <0. Hence, ming: pigipy=- D(g\lp) +r — Digl|lp) is 
monotone decreasing for r’. Thus miny-piqipy<r D(gilp) +r — Diqilp) = 
Ming: n¢q\p)=r D(q ||P) +r — D||P) = ming:n¢q\p=r DIP) = DCPs, IIP). 

(d) See (h) of Exercise 3.45. 
(f) See (j) of Exercise 3.45. 
(g) (3.161) follows from the combination of (3.163), (3.164), and (3.166). 
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Exercise 3.53 


(a) The relation (3.165) implies that 4 — os) weal De - OS) See Dilly) 


—sy2 
Since lim;-,~69 D(ps|| p) = Doas(DlB) and the map s as Ppl) - is monotoni- 


cally decreasing, we have D(p,|| Pp) < Dmax(p|| p). Hence, up avi) sup 


S—>—0O 


=sr— slp) 


(b) Dilip) - DP) = Dd, 4x) og p(x) — log p(x)) = —Dmax(p||P). Consi- 
der the subset {x| log p(x) — log p(x) = Dmax(p||p)}. When the support of g is 
included in the subset, we have the equality. 

(c) Consider g = p, with the limit s > —oo. 


Exercise 3.54 


(a) For the derivations of (3.172) and (3.173), substitute 9 = s and X = — log (G8) 


in (2.163) and (2.165), respectively. For the derivation of (3.173), substitute 6 = s — 1 
X =—log (S3 ) in (2.165). 


P(x) 
(b) They can be shown by using the convexity of ¢(s). 
(c) The definition of s,, (3.109), and (3.175) imply r = D(p,,|| p) = (s, — D¢'(s-) 
— $(s,) = maxs<i(s — 1)¢’(s,) — o(s). The relations (3.166), (3.164), (3.110), 
and (3.175) yield De = SHO) = D(p,, || p) = 5-9'(s-) — O(s;) = 


1—s, 
max;s<o 5¢'(s,) — P(s). 


(d) Choose the test { —1tog (¢ “) < (5, )}. The relations (3.176) and (3.173) 


p" (x") 
guarantee that the error exponent of first kind is r. The relations (3.177) and (3.172) 
show that the exponent of the correct decision when true is p is suppl P| 
s<0 
(e) Choose the random test as follows: When the outcome belongs to the set 
{-2 log (2) = Dinax (Pl py}. we support the hypothesis p with probability 


eNr+los P(PIIP)) The definitions of P(p||p) and P(p||p) implies that 


p"(x") 
p"(x") 
p"(x") 
p(x") 


a ee ie : ae 
lim —— log p | -= tos ( ) = Pros( | = —log P(pllp), (3.185) 
n—0oo n n 


Him, — tog 9" | tog (ZEP) = Dap) | = log PPI). 3.186) 
noo Nn n 

The relation (3.186) guarantees that the error exponent of first kindisr + log P(p||p) — 
log P(p||p) =r. The relation (3.185) shows that the exponent of correct decision 
when true is p is r + log P(p||p) — log P(p||p) =r + Dmax(p|| Pp), which equals 
super IP) as shown in (3.167). 


s<0 


Exercise 3.55 The relations (3.171), (3.177), and (3.160) yield that 


dB*(r|pilip) _ d ' 7 
a =a (s-@'(s-) — b(S-)) = 


1 i _ 1 
“G-ius ear 


— O(S))|s=s, 
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Then, the relation (3.160) yields that 


d* B*(r|p||p) _di iit 
dr? ~ drs, —1 
ds, 1 1 1 1 


= = > 0. 
dr (s-— 1)? (s» — Db") (5 — 1)? (s+ — 1)36'"(s,) — 


Exercise 3.56 


(a) The existence of {T,,} follows from the direct part of quantum Stein’ lemma. 
(b) The desired inequality follows from the converse part of quantum Stein’ lemma. 
(c) The inequality B*(r|p||o) < inf;:p(r0)>, D(T||p) follows from (a) and (b). The 
equation inf;:p(rjjo)>r D(T ||P) = min;:p(r|o)>r D(T||e) follows from the continuity 
of D(r|Ip)- 


Exercise 3.57 Lemma 3.7 guarantees that limy_... Tr tT®"(J — T,) = 1 because 
D(t|lo) < lim —1 log Tr o®" (I — T,,). Hence, applying Lemma 3.7 again, we have 
lim —} log Tr p®"T, < D(r||p). 


Exercise 3.58 The desired equation follows from ¢(s|p||o) > limn—oo 7S Pin [PMon) 
and d(slpllo) = limy—oo 1 4(s| Ken (p®") je="): 
Exercise 3.59 


(a) We give aspectral decomposition « = 97, « £;. Then, e?™ *° Il) — max; || E;o7!/ 
po'/? E; || < loo? pa 1/? | = eP max (pllO)2 Dimax Ko (PIO) 

(b) Lemma 3.10 implies that |E,an|Il(o2")— 1/74 cn (p2")(o2")— 1/7] = (2) 1/2 
p8"(g8")-1/2, Hence, Dax (Koen (p®") ||o®”) + log |Egen| > Dmax(p®" ||o®") = 
nDymax(p||o). Finally take the limit n — oo after the dividing the both side with n. 
(c) Due to (a), 12 ctseon PIM) < 1p. (eon (p®")I102") < ! Dynax(p2"|12") = Dinax (plo): 


Taking the limit n > oo, we have lim, 0 2S! < Dynax (plo). 


‘ o(— —s|Kan (p2")||o2" " o(—s 2 
Since 2% Stelle) > Leerlke ie ho we have limy 00 acta) = = littiy +60 


n 
@ cae ve n ( On) @n 1 
ales ee _ = Dinax (Kien (p®")||o®"). 


Taking the limit n — ov, we have lims-. 6 HEslolo) > limy oo 2 Dimax (Kien (p®") 
\|o®”) > Dmax(pl|o), which follows from (b). 
Exercise 3.60 Assume that o = >); 0;M;. Then, k,02(p®*) = 37). ;(Mi ® Mj + 
M; ® M;)p®"(M; ® M; + M; ® M;) + >); (M; ® M;)p®?(M; ® Mi), Hence, 
Tr K,22 (p®7) (o®”)* = a, o30; Tr[(M; ® Mj + Mj ® M;)p®"(M; ® M; + 
Mj ® Mj)]'~* + 3, 07° Tr(M; ® Mi)p®(M; ® M;))'~*. 

Onthe other hand, e%!Pr"lPo) =. _. ofo% Tr[(M; ® M;)p(Mj ® M;) + (Mi ® 
Mj)p®?(M; @ Mj)\'~* + >), 7° Tr((M; @ M;)p®°(Mi @ Mj))'~*. 

Hence, we have ele (P™)lle) _ 926(s1P IPS) — Dis; 7705 THM; ® Mj + 
Mj; ® M,)p®?(M; ® M; + M; ® M;))'~* — (Mj ® M;)p®*(M; ® M,))'*. 
— ((M; ® M;)p®"(M; ® M;)))'~*]. Since the rank-one PVM M = {M;} is not 

j)P j 
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commutative with p, there exists a pairi, j such that (M; @ M;)p®*(M; ® M jp #9. 
Since x > x!~° is strictly convex, Exercise 1.35 implies Tr[((M; ® M; + M; ® 
M;)p®"(M; ® M;+M; ®M;,))'* — (M; ® M;)p®*(M; @ M;))'*.— (M; ® 
Mj)p®*(M; ® Mj)))'~*] > 0. 


Exercise 3.61 Due to a discussion similar to Exercise 3.60, it is sufficient to show 
that Tr[((M; ® M; + M; ® M;)o®?(M; ® M; + M; ® M;))° — ((M; ® M;)o® 
(M; ® M;))*’.-— (M; ® M;)o®(M; @ M;))*| > 0. This inequality follows from 
Exercise 1.35 and the strict convexity of x > x°. 


Exercise 3.62 


(a) For any POVM M = {M;j}, there exists a rank-one POVM M’ := {M; ;} such 
that >* j M: j = Mi. Next, we choose the Naimark extension M " of M’. Then, M” 
is arank-one PVM M and ¢(s|P™"||P™") > ¢(s|P™||PM). 
(b) Consider the case when the rank-one PVM M = {M;} is not commutative with 
p. Then, apply Exercise 3.60 to the states Kyg(o) and p. Further, Exercise 3.58 
implies 24(s|pllo) > (8 |e (oye (P27) ||K (7)®?). Therefore, we obtain (3.181) in 
this case. 

Consider the case when the rank-one PVM M = {Mj} is commutative with p. We 
can show (3.181) by using Exercises 3.58 and 3.61 in a similar way. 
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Chapter 4 
Classical-Quantum Channel Coding 
(Message Transmission) 


Abstract Communication systems such as the Internet have become part of our 
daily lives. In any data-transmission system, data are always exposed to noise, and 
therefore it might be expected that information will be transmitted incorrectly. In 
practice, however, such problems can be avoided entirely. How is this possible? For 
explaining this, let us say that we send some information that is either 0 or 1. Now, let 
us say that the sender and receiver agree that the former will send “000” instead of “0” 
and “111” instead of “1.” If the receiver receives “010” or “100,” he or she can deduce 
that the sender in fact sent 0. On the other hand, if the receiver receives a “110” or 
“101,” he or she can deduce that a | was sent. Therefore, we can reduce the chance of 
error by introducing redundancies into the transmission. However, in order to further 
reduce the chance of an error in this method, it is necessary to indefinitely increase 
the redundancy. Therefore, it had been commonly believed that in order to reduce 
the error probability, one had to increase the redundancy indefinitely. However, in 
1948, Shannon (Bell Syst Tech J 27:623-656, 1948) showed that by using a certain 
type of encoding scheme, it is possible to reduce the error probability indefinitely 
without increasing the redundancy beyond a fixed rate. This was a very surprising 
result since it was contrary to naive expectations at that time. The distinctive part of 
Shannon’s method was to treat communication in the symbolic form of Os and 1s and 
then to approach the problem of noise using encoding. In practical communication 
systems such as optical fibers and electrical wires, codes such as 0 and | are sent 
by transforming them into a physical medium. In particular, in order to achieve the 
theoretical optimal communication speed, we have to treat the physical medium of 
the communication as a microscopic object, i.e., quantum-mechanical object. In this 
quantum-mechanical scenario, it is most effective to treat the encoding process not 
as a transformation of the classical bits, e.g., Os, 1s, and so on, but as a transformation 
of the message into a quantum state. Furthermore, the measurement and decoding 
process can be thought of as a single step wherein the outcome of the quantum- 
mechanical measurement directly becomes the recovered message. 
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4.1 Formulation of the Channel Coding Process 
in Quantum Systems 


There are two main processes involved in the transmission of classical information 
through a quantum channel. The first is the conversion of the classical message into a 
quantum state, which is called encoding. The second is the decoding of the message 
via a quantum measurement on the output system. For a reliable and economical 
communication, we should optimize these processes. However, it is impossible to 
reduce the error probability below a certain level in the single use of a quantum 
channel even with the optimal encoding and decoding. This is similar to a single use 
of a classical channel with a nonnegligible bit-flip probability. However, when we 
use a given channel repeatedly, it is possible in theory to reduce the error probabil- 
ity to almost 0 by encoding and decoding. In this case, this reduction requires that 
the transmission rate from the transmission bit size to the original message bit size 
should be less than a fixed rate. This fixed rate is the bound of the transmission rate of 
a reliable communication and is called the capacity. This argument has been mathe- 
matically proved by Shannon [1] and is called the channel coding theorem. Hence, 
it is possible to reduce the error probability without reducing the transmission rate; 
however, complex encoding and decoding processes are required. This implies that 
it is possible to reduce the error probability to almost 0 while keeping a fixed com- 
munication speed if we group an n-bit transmission and then perform the encoding 
and decoding on this group. More precisely, the logarithm of the decoding error can 
then be decreased in proportion to the number n of transmissions, and the number n 
can be considered as the level of complexity required by the encoding and decoding 
processes. These facts are known in the quantum case as well as in the classical case. 

In this chapter, we first give a mathematical formulation for the single use of a 
quantum channel. Regarding n uses of the quantum channel as a single quantum 
channel, we treat the asymptotic theory in which the number n of the uses of the 
given channel is large. 

In the transmission of classical information via a quantum channel, we may denote 
the channel as a map from the alphabet (set of letters) ¥ to the set S(H) of quantum 
states on the output system H, i.e., a classical-quantum channel (c-q channel) 
W : X > S(H).' For mathematical simplicity, we assume that the linear span of all 
of supports of W,. equals the whole Hilbert space 11. The relevance of this formulation 
may be verified as follows. Let us consider the state transmission channel from the 
input system to the output system described by the map I" : S(H’) > S(H), where 
H’ denotes the finite-dimensional Hilbert space of the input system. When the states 
to be produced in the input system are given by the set {/,},<x, the above map W is 
given by W, = I'(p,). That is, sending classical information via the above channel 
reduces to the same problem as that with the c-q channel W. 


' As discussed later, these types of channels are called c-q channels to distinguish them from channels 
with quantum inputs and outputs. Here, 1 is allowed to contain infinite elements with continuous 
cardinality. 
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When all the densities W, are simultaneously diagonalizable, the problem is 
reduced to a channel given by a stochastic transition matrix. As in hypothesis testing, 
we may call such cases “classical.” Then, Theorem 4.1 (to be discussed later) also 
gives the capacity for the classical channels given by a stochastic transition matrix. 


4.1.1 Transmission Information in C-Q Channels 
and Its Properties 


As in the classical case (2.34), the transmission information /(p, W) and the average 
state W, for the c-q channel W are defined as? 


def 


1p, W) & > po@)D W|I Wp) = H(W,) — I p@HW,), 4.1) 


xEX xEX 


Wy = > P@)We. (4.2) 
xeX 


The transmission information /(p, W) satisfies the following two properties: 


@® (Concavity) Any two distributions p! and p” satisfy 
I(\p' + (1 = d)p*, W) = A(p!, W) + = AY (p”, W). (4.3) 


See Exercise 5.27. 

@® (Subadditivity) Given two c-q channels W4 from X4 to H, and W® from 2p to 
7g, we can naturally define the c-q channel W4 @ WP from X4 x Xp toH, ® He 
as 


(Wew) 2wrewe. (4.4) 


XA XB 


Let pa, pg be the marginal distributions in V4, Vg for a probability distribution p 
in V4 x Xp, respectively. Then, we have the subadditivity for 


I(p, W* @ W*) < Ips, W*) + T(pp, W"). (4.5) 
This inequality can be shown as follows (Exercise 4.2): 
I(pa, W*) + I (pp, W®) —I(p, W* @ W*) 


=D ((w'@ Ww"), ||WA @ WZ) 0. (4.6) 


Dp | PA 


?In many papers, the quantity /(p, W) is called the quantum mutual information. In this text, it will 
be called the transmission information of the c-q channel, for reasons given in Sect. 5.4. Occasionally 
we will denote this as (py, Wy). 
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From this property (4.5), we can show that 


max I(p, W4 @ W®) = max I(p,, W“) + max I (ppg, W®) , 
p Pa Pp 


which is closely connected to the additivity discussed later. Another property of the 
transmission information is the inequality 


1p, W) = > p@)D(W¢ || Wp) < > p@DW, Io), Vo SCH). (4.7) 


xEX xXEX 


This inequality can be verified by noting that the LHS minus the RHS equals 
D(W,||o). 


Exercises 


4.1 Show that C.(W) =h (eile!) if {W,.} is composed of the two pure states 
|u)(u| and |v) (v]. 


4.2, Show (4.6). 


4.1.2 C-Q Channel Coding Theorem 


Next, we consider the problem of sending a classical message using a c-q channel 
W:X — S(H). For this purpose, we must mathematically define a code, which 
is the combination of an encoder and a decoder as Fig.4.1. These are given by 
the triplet (NV, y, Y). The number A is a natural number corresponding to the size 
of the encoder. vy is a map, y: {1,..., N} — 4, corresponding to the encoder. The 
decoder is a quantum measurement taking values in the probability space {1, ..., N}. 
Mathematically, it is given by the set of N positive semi-definite Hermitian matrices 
Y = {¥;}%_, with >, Yi <1. In this case, J — >"; Y; corresponds to the undecodable 
decision. 

For an arbitrary code ® = (N, yy, Y), we define the size |®| and the average error 
probability <[®] as 


: fle 
|o| SN, lO] hd — TW). (4.8) 


i=l 


If we need to identify the c-q channel W to be discussed, we denote the aver- 
age error probability by ¢w[®]. We then consider the encoding and decoding 


Message Alphabet Output State 


is = > X= (i) ——+] Channel (W) | ——» wx ——+|] Decoding(Y) | —» 
ncode 


Fig. 4.1 Encoding and decoding 
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for n communications grouped into one. For simplicity, let us assume that each 
communication is independent and identical. That is, we discuss the case where 
the Hilbert space of the output system is H®”, and the c-q channel is given by 


the map W” : x” = Bissia te WO = W,, ®---@ W,, from the alphabet 
X" to S(H®"). Such a channel is called stationary memoryless. An encoder of 
size N, is given by the map yp” from {1,...,N,} to ¥”, and it is written as 
py (i) = (yi, ..., pW). The decoder is also given by the POVM Y on 
H®". Let us see how much information can be sent per transmission if the error 
probability asymptotically approaches 0. For this purpose, we look at the limit of the 
transmission rate R = lim, ,,, + log || (|® | = e”*) for the sequence of reliable 


codes {@™ = (N,, p™, Y™)} and discuss its bound, i.e., the c-q channel capacity 
C.(W)?: 


def : 1 (n) 
C.(W) = sup j lim —log|é"” | 
{OM} n 


lim <[@] = o| : (4.9) 


where @ denotes a code for the quantum channel W“. We may also define the 
strong converse c-q channel capacity as the dual capacity 


P is 1 
cl(w) = sup {im “ogi 
{BM} n 


lim e[@™”] < 7 , (4.10) 


which clearly satisfies the inequality C.(W) < C'(W). We then have the following 
theorem. 


Theorem 4.1 ({2—6]) Let P¢(¥) be the set of probability distributions with a finite 
support in X. Then, 


Ci(W) =C.(W) = sup [(p,W) = min sup D(W, llc) (4.11) 
PEP (X) 7ES(H) vex 
holds. 


Thus, this theorem connects the c-q channel capacity C,(W) to the transmission 
information I(p, W). Here, note that the former is operationally defined, while the 
latter is formally defined. The additivity of the c-q channel capacity 


C.(W") + C.(W*) = C.(W* @ W*) (4.12) 


also follows from inequality (4.5). 
For a proof of Theorem 4.1, we introduce two quantities: 


def 


1 l—syy7s 
T_;(p, W) = —— log > p(x) Tr W.*W, (4.13) 
é 


3The subscript c of C, indicates the sending of “classical” information. 
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at. 
ef t= Ts 
it, wy # -S—* le Secoe) (4.14) 
> a 
clL.(w) = sup 1 _.(p, W), (4.15) 
PEP (X) 


where [\(p,W) and rg (p,W) are defined to be I(p,W). These quantities 
satisfy™* 44423 


lim I-s(p, W) = Ip, W) (4.16) 
lim ch_(W) = sup [(p,W)= min sup D(W,||c). (4.17) 
s>0 DEP; (X) cES() vex 


Then, Theorem 4.1 may be proved using these properties and the two lemmas given 
below in a similar way to that of hypothesis testing. 


Lemma 4.1 (Direct Part [4, 5]) For an arbitrary real number R > 0 and an arbi- 
trary distribution p € P(X), there exists a code ® for the stationary memoryless 
quantum channel W") such that 


e[@] < Gel Minsero.4) SRM —s(P.W)) | \o™| = ek (4.18) 


Lemma 4.2 (Converse Part [6]) When a code ©” for the stationary memoryless 


quantum channel satisfies |® | = e”"®, the relation 


t 
s(R-Cy_,(W)) 
I-s 


1 — c[G@™] < eM M™e-00.01 (4.19) 


holds. 


Proof of Theorem 4.1 Thanks to Lemma 4.1, when R < I(p, W) < supyep,.x) 
I(p', W), the relation (4.16) guarantees that the exponent mingejo,1] 5(R — Li_s(p, W)) 
is strictly negative, which implies the decoding error probability ¢[@“] goes to zero 
exponentially. On the other hand, thanks to Lemma 4.2, when R > sup,cp..x) 1(p, W), 


s(R—C}_,(W) 


the relation (4.17) guarantees that the exponent maxs< (90,0) = 


) is strictly 


negative, which implies the quantity 1 — e[®] goes to zero exponentially. These 
two facts and (4.17) show (4.11). a 


Indeed, this theorem can be generalized to the case when a sender sends the message 
to M receivers. This case is formulated by M-output channel W!,..., W”, with M 
output systems H,,..., #4 and a single input system, where W! = (W’). In this 
case, the encoder is defined in the same way, i.e., py: {1,...,N} — %¥. However, 


the decoder is defined by M POVMs Y!,..., Y”. That is, the code is described by 
@p = (N, Y, Y!,..., Y™). In this case, the error of @ is given as the worst decoding 


error probability <[®] = maxi <icm y panera —Tr Wik): Further, in the same 
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way as (4.9), the capacity C.(W!,..., W™) is defined as 
lef . 1 c 
c.(W!,..., WY) © sup {im “ogi lim e[o™] = of. (4.20) 
{QP} n no 


Then, we have the following proposition. 


Proposition 4.1 


SIs 


C.(W!,..., W™”) = sup min T(p, W'). (4.21) 
Dp 


For a proof, see Exercises 4.13 and 4.32. 
Exercises 


4.3 Define Ji_s(p, 7, W) := -t log ©, p(x) Tr W!~‘o%. Show that 
T_,(@, W) = min i_s(p, 7, W) (4.22) 
and that the above minimum is attained only 


O1-sp = (= p(x) wi”) : /Tr (= pon”) = (4.23) 


x x 


Hint: Use the matrix H6lder inequality (A.26) and the reverse matrix Hélder inequal- 
ity (A.28). 


4.4 Show (4.16) and 

lim I_,(@, W) =I, W). (4.24) 
4.5 Show that 

I_.@, W) < hsp, W). (4.25) 


4.6 Show the following inequality for s € [—1, 1] \ {0} as an inequality opposite to 
(4.25) when all of W, are commutative with each other [7, (16)]. 


a (p, W) = Ty_-s(p, W). (4.26) 


Inequality (4.26) can be shown by a similar way as (5.145) in the general case. 
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4.2 Coding Protocols with Adaptive Decoding 
and Feedback 


In the previous section, there was no restriction on the measurements for decoding. 
Now, we shall restrict these measurements to adaptive decoding, and they have the 
following form: 


M" = {M1 y, 8 ++ OM" oy cyedixe x Dae 
Therefore, the decoder may be written as the POVM M" and the mapping 7” : 
Yi mee x Vz {1,...,Mn}- 

We also allow feedback during the encoding process. That is, the receiver is 
allowed to send his or her measurement outcomes back to the sender, who then 
performs the encoding based on these outcomes. In the previous section, we consid- 
ered the encoder to be a map yp (i) = (yi), ..., pH) from {1, ..., Np} to ¥”. 
If we allow feedback, the kth encoding element will be given by a map gw from 
{1,...,N,} x Vx--: x Y_) to Vv. Therefore, in this case, we denote the encoder 
as 6 © (Gy, ..., 9). 

Henceforth, we call 6” = (N,, 6™, M", 7) the code with adaptive decoding 
and feedback and denote its size N, by ||. The average error probability of the 
code is denoted by e[®]. If the code has no feedback, it belongs to the restricted 
subclass of codes given in the previous section. However, if it has feedback, it does 
not belong to this subclass. That is, the class of codes given in this section is not a 
subclass of codes given in the previous section. This class of codes is the subject of 
the following theorem. 


Theorem 4.2 (Fujiwara and Nagaoka [8]) Define the c-q channel capacity with 
adaptive decoding and feedback C.(W) as 


- : ee 
C.(W) = sup {im “toe 
n 


{6} 


lim [6] = o| : (4.27) 
n—->©o 


where ®” is a code with adaptive decoding and feedback. Then, 


C.(W) =sup sup J(M,p, W), (4.28) 
M peP;(X) 


def 


where I(M,p, W) = >divex P(X)DP Hy. PY): 


When the maximum maxjy SUPpe P(X) I(M, p, W) exists, the capacity C.(W) can 


. F ; def 
be attained by performing the optimal measurement My = argmaxy SUP, <p.) 
I(M, p, W) on each output system. Thus, there is no improvement if we use adaptive 
decoding and encoding with feedback. 
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Proof For an arbitrary positive real number € > 0, we choose a POVM M such that 
sup, 1(M, p, W) = supy’ sup, I(M’, p, W) — «. Then, the relation between the input 
letter and the output measurement data is described by the stochastic transition matrix 
Xb Pee . Applying Theorem 4.1 to the classical channel x bt» 1g , we see that a 
code attaining sup, I(My, p, W) exists. That is, C.(W) > SuPy’ SUP, I(M', p, W) — 
e. Since € > 0 is arbitrary, we have C.(W) > supy’ Sup, I(M', p, W). 

Next, we show that there is no code with a rate exceeding the RHS of (4.28). Con- 
sider a sequence of codes {6 = (N,, 6, M", 7™)} satisfying limp +0 e[@] = 
0. Let X be a uniformly distributed random variable taking values in the input mes- 
sages {1,..., N,} and Y* = (Y,..., Y¢) be the random variable corresponding to the 
outcome of the measurement M”. Since <[@™] = P{X 4 7 (Y")}, Fano’s inequal- 
ity (2.35) yields 


log2 + e[@™] log N, > H(X) — 1(X : 7 (¥")) 


= log N, — 1(X : 7 (¥")) (4.29) 


because H(X) = log N,,. 
Now, to evaluate I(X : rT (Y”)), we define 


def . 
Pyx.yi-t (ela, YO!) = pe OW = = Tr Weer yin bo, 


30 xykol N,Yk 
From the monotonicity of the classical relative entropy and the chain rule (2.32) for 


mutual information, 


WX: r™(¥")) < 1X: Y%) = DX » ¥,(¥*!) 
k=1 


ke Yoel 
>> en ae 
k= 1 yh- 1 
= > SoPye (yy) ~ ane Pyyyt-tayt-t ,W) 
k=1 ye 
<nsup sup I[(M,p, W). (4.30) 
M pePy(X) 
(4.29) and (4.30) yield that 
log2+ <[@]logN, > logN, —nsup sup I(M,p, W), (4.31) 


M peP;(&) 
which can be rewritten as 


(log 2)/n + supyy SUP cP.) I(M, p, es 
1— [6] 


- log Ny < (4.32) 
n 
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Since e[6™ ] > 0, 


ea) | 
lim —logN, <sup sup /(M,p, W), 
n M peP;(X) 


completing the proof. a 


Therefore, if the decoder uses no correlation in the measuring apparatus, the c-q 
channel capacity is given by C.(W). Next, we consider the c-q channel capacity 
when correlations among n systems are allowed. In this assumption, we can regard 
the n uses of the channel W as a single channel W and then reuse the arguments 
presented in this section. Therefore, the c-q channel capacity is given by cw) Its 
limiting case is 


C.(W™ 
tim ©") _ cw, (4.33) 
n 


n—-> Oo 


while C.(W) < C.(W), except for special cases such as those given in Sect. 4.7. An 
interesting question is whether it is possible to experimentally realize a transmission 
rate exceeding CA W). This is indeed possible, and a channel W and a measurement 
M have been experimentally constructed with 1(M, p, W®) > 2C.(W) by Fujiwara 
et al. [9]. 


Exercise 


4.7 Show (4.33) using Fano’s inequality (2.35) in a similar way to the proof of 
Theorem 4.2. 


4.3 Channel Capacities Under Cost Constraint 


Thus far, there have been no constraints on the encoding, and we have examined only 
the size of the code and error probabilities. However, it is not unusual to impose a 
constraint that the cost, e.g., the energy required for communication, should be less 
than some fixed value. In this situation, we define a cost function and demand that the 
cost for each code should be less than some fixed value. More precisely, a cost c(x) is 


defined for each state W,. used in the communication. In the stationary memoryless 


ba ce def 
case, the cost for the states W,” is given by c (x) = >“, c(xi). The states W.” 


used for communication are then restricted to those that satisfy 5°, c(v;) < Kn. That 
is, any code 6” = (N,, p™, Y™) must satisfy the restriction max; c (yp (i) < 
Kn. The following theorem can be proved in a similar way to Theorem 4.1. 
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Theorem 4.3 ([{10, 11]) Define the c-q channel capacities under the cost constraint 


ef l eo” MQM 
Core W) "sup im 08 | | max ee) < K, lim 10 1=0, 
~ (o} n i n n—>0o 
; “i los |@™ COMA E 
Clear W) sup im a Salat max © = = K,timeto" <1]. 
~ {OM) n u n 


Then, 
Cec<K (W) = Gi <x(W) 


= sup J(p,W)= min sup DPxD(W,||o), (4.34) 
pePecx(¥) FES) peP.cx(X) d 


where Prek(&) © {p € P(X) |X, p@e(x) < K}. 


For a proof of Theorem 4.3, we introduce a quantity: 


def 
Ch vcex(W) = sup Ii_,(@, W). (4.35) 


Pp €Pe<k 


These quantities satisfy" *”’ 


lim C}_,.< (W)= sup J(p,W)= min sup DxD(W,.||o). (4.36) 
ee ae pePrak(X) ae oe 


Then, Theorem 4.3 can be obtained from the following two lemmas in a similar way 
to Theorem 4.1. These lemmas will be proved later in Sects. 4.5 and 4.6. 


Lemma 4.3 (Direct Part) For arbitrary real numbers R > 0 and K and an arbi- 
trary distribution p € Pc<k(&), there exists a code ®™ = (Ny, p™, Y™) for the 
stationary memoryless quantum channel W"? such that 


e[o”] Z a ef Minse(o,1] SR—N_s(P,W)) jo™| _ elt® (4.37) 
n,K 
yn. 
ax cp") <K, (4.38) 
i n 


where Cy,x = p™{c™ (x) < nK}. 


Lemma 4.4 (Converse Part) When a code 6 for the stationary memoryless quan- 
. ny nn; - 
tum channel satisfies |®™| = e”® and max; ee) < K, the relation 


+ 
S(R-Cy siege (W)) 


1 — e[@™] < et M™%<-0.)] FT (4.39) 


holds. 
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Proof of Theorem 4.3 We choose p€ Pe<x(¥) such that R <I(p,W) < 
SUP py eP.-x (x) L (p’, W). Then, the central limit theorem guarantees that the quantity 
Cr,x goes to 1/2. The relation (4.16) guarantees that the exponent mingejo, 1] s(R — 
I,_s(p, W)) is strictly negative. The decoding error probability «[®“] in Lemma 
4.3 goes to zero exponentially. On the other hand, thanks to Lemma 4.4, when 
R > suppep._.(v)/(p, W), the relation (4.36) guarantees that the exponent 

S(R—Ch giccx (W)) 


MAXs¢(—00,0] is is strictly negative, which implies the quantity | — 
e[O™] 

goes to zero exponentially. These two facts show Theorem 4.3. a 
Exercise 


4.8 Let the set {W,.} consist entirely of pure states. Let the cost function c be given 


by c(x) = Tr W,E, where E is a positive semidefinite Hermitian matrix on 7. Show 


that Cocex(W) © Cocex(W) = H(pe.x), where ppx © e~PE/ Tr e~PeE and Be 


satisfies Tr(e~°** / Tr e~ "FE = K. 


4.4 A Fundamental Lemma 


In this section, we will prove the lemma required for the proof of Theorem 4.1. 


Lemma 4.5 (Hayashi and Nagaoka [11]) When two arbitrary Hermitian matrices 
Sand T satisfy I > S > 0 and T = 0, the following inequality holds: 


fa. J5 ET SUSEP 27g s aay, (4.40) 


where /S + T ' is the generalized inverse matrix of ./S + T given in Sect. 1.5. 


Proof Let P be a projection to the range of S + T. Since P commutes with S and T, 
for proving (4.40), it is sufficient to show that 


pli-vS+T ‘SVS4T ‘| P SPU = 8) +47]P, 


pe [i-vS+T 'S/S4T | <P! (20 —S)+4T]P", 


where we defined P+ = IJ — P. The second inequality follows from P+S = P+T = 


PLJSS + ii = 0. Thus, for proving (4.40), it is sufficient to show that (4.40) holds 
when the range of S + T is equal to 1. 

Since (A — B)*(A — B) = 0 for matrices, we obtain A*B + B*A < A*A+ B*B. 
Applying this inequality to the case of A= /T and B= /T(/S + _—_ ), we 
obtain 


a 5+T '-1)+( S47 '-1)T 
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< T+(JS+T —1)T(JS+T -1). (4.41) 
Furthermore, 
JiET Sas = 5 (4.42) 


since f(x) = ./x is a matrix monotone function™**’ and 0 < S < J. Finally, 


(S47 8/827 =4SaT TST 

=1+1 (JS+P '-1)+(VS+T '-1) 7+(/S+F '-1)7 (VS+T '-1) 
<27 +2( S+T'-1)T( S+T-1) 

<2 +2( S+T '-1)(8+7)( S+T-1) 

=27 +2(1+5+T-2VS4T) 

227 +20 +5 +7 =28) = 20 = 5) +47, 


where the first inequality follows from (4.41) and the third inequality follows from 
(4.42). Thus, we obtain the matrix inequality (4.40). a 


Exercise 


4.9 Show the generalized version of inequality (4.40) under the same conditions as 
Lemma 4.5 [11]: 


1=5RT SS4T 204009 2C4eh-57. (4.43) 


4.5 Proof of Direct Part of C-Q Channel Coding Theorem 


The arguments used for hypothesis testing in Chap. 3 may be reused for the proof of 
the converse theorem (Lemma 4.1) using the following lemma. 


Lemma 4.6 (Hayashi and Nagaoka [11]) Given a c-q channel x € X > W,, there 
exists a code ® of size N such that 


e[P] <> pt) (2 Tr W, {W,—-2NW, <0}+4N Tr W, {W,—2N W, > 0}) 
xEX 
=2Tr(p x W){(p x W) < 2Np ® W,} 
+ 4N Trp ® W,{(p x W) > 2Np @ Wy} (4.44) 


where p is a probability distribution in X, and the matrices p x W and p®o on 
H @ C!PP”)! are defined as follows. 
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p(x) Wy pP(xi)o 
def . 0 0 
pxwe= "2 ,po= i 
¢ POXK) We, POXK)O 
(4.45) 


The RHS of (4.45) is called dependence test (DT) bound because the projec- 
tion {(p x W) > 2Np ® W,} tests the correlated state (p x W) with comparing the 
independent case p ®@ W,. 

Before proceeding to the proof, we notice that /(p, W), h_s(p, W), and Tp W) 
are written as 


I(p, W) = Dp x W\lp ® Wy) (4.46) 

I\-s(p, W) = Di-s(p x W\lp ® W,) (4.47) 
| oy es 

T_,@, W) = _min, Di-«@ x Wp @a). (4.48) 


Proof of Lemma 4.1 For the simplicity, we use notations S; := p x W and S> := 
p ® W,. Applying Lemma 4.6 to the pair of channels W“) and the n-fold independent 
and identical distribution of p € P;(4’), we can take a code of size N,, satisfying 


e[o] 
<> pe) 2Tr WO (WY — 2N,WY < 0} 
xe Kn 
+ 4Nn Te Woe {Wor ~~ 2Nn Wy > 0}) 
=2 Tr SP"{SP” — 2N,S$" < 0} + 4N, Tr SP"{SP" — 2N,SP” > 0} 
22 TH(S?")* “GN,,85")" 
—2! 45 prs(R—D1-s(S1||52)) — 2] +5 ns(R-h-s(p.W)) (4.49) 


where (a) follows from Lemma 3.3 with A = SP” and B = 2N,,S}". This completes 
the proof of Lemma 4.1. a 


Proof of Lemma 4.6 We prove this lemma by employing the random coding method 
in which we randomly generate a code (N, y, Y) of fixed size and prove that the 
expectation of the average error probability is less than €. Based on the above strategy, 
we can show that there exists a code whose average error probability is less than e. 
For this purpose, we consider N random variables X “ (X),..., Xy) independently 
obeying a probability distribution p in 1, define the encoder yx(i) by yx) = Xj, 
and denote the expectation by Ey. 
For a given encoder y of size N, a decoder Y(y) is defined 


N -} N -4 
rion = (Som) n(>o5) , (4.50) 


j=l j=l 
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7 = {Wow —2NW, > O}. (4.51) 


Then, the average error probability of the code (vy) = (N, vy, Y(~)) is 


e[P(y)] 
ie ie 
=V yh Wo — Y(y),) < ae Wow 2d — m7) +4 >> 7; 
i=l i=l pid 
1 N 
=5 ST 2WwU — mi) + Te 14>) Wow |) mi] - (4.52) 


i=l pidi 


For evaluating the expectation Ey[e[®(yx)]], let us rewrite Ex[Tr W,, (i) — 77)] 
and Ex[Tr Wy, 7] as 


Ex [Tr Wea — 7) ] = >- pe) Tr Wy {Wy — 2NW, < O}, 


xXEX 
Ex [Tr We,qmi] = >) p@’) D2 p(x) Tr Wy {W, — 2NW, > O} 
WEX xEX 
= > p(x) Tr W, {W, — 2NW, > O}. 
xEX 
Ex [e[®(yx)]] then becomes 
l N 
Ex [e[®(px)] SEx| = SOT 2W Um) +Te (4 Wy. | 7 
i=l izj 
1 N 
=e >= 2Ex[Tr Wex@U—7m)|+4 SC Ex[Tr WoT 
i=l ixj 
=>) (2 Tr W, {W. — 2NWy < 0} 
xXEX 
+ 4(V — 1) Tr W, {W, —2NW, > 0}). (4.53) 


Since the RHS of this inequality is less than the RHS of (4.44), we see that there 
exists a code ®() that satisfies (4.44). | 


Proof of Lemma 4.3 We show this lemma using the random coding method, as in 
Lemma 4.6. For a channel Ww” with x € ¥”, we consider a random coding for 
the probability distribution p(x) = p"(x)/Cr,« on the subset x = {c™ (x) < nK}. 
In this construction, we choose 7; to be {W, i) — 2Np we" > O}. Using the same 
notation as that of Lemma 4.6, we obtain 
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Ex[e[®(y)]] < > P(x) (2 Tr [wy {w{” -2Nn, We” <0}] 
xean 


+4N, nf (> pow.) {w.—2N,w2" ~0}]). 


eX 
By noting that p(x) = p"(x)/Cy.« and X C X", we find 


Exle[®(y)I] 
ey 2p" 2 (rr [Ww {we —2N, Wwe" < O}] 


xen 
wen 
+ 2N,, Tr 
Ce 


n,K 


) {wi — 2N,, wor>0}) 


<->" c(t we" {w —2N,W2" <0} 
“ay xEX 
+ 2N, Te] Wo" {Wi — 2, WS" > ]). 


By using the same arguments as those in the random coding method, the proof is 
completed. a 


Exercises 


4.10 Let a and ( be defined by the following. Show inequality (4.44) in Lemma 4.6 
with its RHS replaced by a + 20+ /(G(a + () using Exercise 4.9. 


a = > p(x) Tr W, {W, — 2NW, < 0}, 
xEX 


32 NTrW,{W,—2NW, > 0}. 


4.11 Consider the sequence of codes {®} for the stationary memoryless channel 
of the c-q channel x +> W,. Let us focus on the optimal decreasing exponential rate 
of the average error probability when the communication rate of {@} is greater 
than R: 


le ; 1 ol 
B(R|W) © sup | lim —— log e[®™]| lim —log|®™| > R}. (4.54) 
{@™} n n 


This optimal rate is called the reliability function as a function of the communication 
rate R and is an important quantity in quantum information theory. 
(a) Show that 
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B(R|W) => sup max x shi s(p, W) — R). (4.55) 
pePy(X) 9 


(b) When all states W, are pure states, show that 


B(R|W) => sup max s(4;(W,) — R). (4.56) 
pePy(X) ISS 


4.12 Define another c-q channel capacity by replacing the condition that the aver- 
age error probability goes to 0 by the alternative condition that the maximum error 
probability goes to 0. Show that the modified c-q channel capacity is equal to the 
original c-q channel capacity. 


4.13 Show the inequality C.(W',..., W”) > sup, mini<i<m 1 (p, W') following 
the steps below. 

(a) Show that there exists a code ” with size e”® for M-output channel 
W!,..., W™ such that 


e[@”] < max M- qits pns(R—-N-sP,W')) 
“ 1<i<M 
as an extension of (4.49). 
(b) Show the desired inequality. 


4.6 Proof of Converse Part of C-Q Channel Coding 
Theorem 


In this section, we prove the converse parts of the c-q channel coding theorem, i.e., 
Lemmas 4.2 and 4.4, by using the information inequality (3.20) proved in Sect. 3.8. 
Before the proof of Lemma 4.2, we prepare the following lemma. 


Lemma 4.7 Let & = (N, y, Y) be a code with the size N, and o be a state. Then, 
we have 


®] > B1 (S2()||Si(Y)), (4.57) 
where the density matrices S\() and S2(a) on H®" @ CN are given as 


a Woa) 
def 1 0 def | ° ; 0 
a7 a ’ S| (py) = 
N\o0 N\ g 
o Wow) 


Sx(a) = 


When Py is the uniform distribution over the image of y, we have S2(c) = Px ® a 
and $,(y~) = Py x W. Taking the infimum for Py in both sides in (4.57), we have 
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aun) = tg G1 (Px @ o||Px x W) (4.58) 


for any state 0. Then, taking the supremum for a, we have 


oun ele! = sup ine 61 (Px @ol[Px x W). (4.59) 


The above lower bound is called the Meta converse bound, which is helpful for 
calculating the lower bound of the minimum decoding error probability. 


Proof We choose a matrix T 


0 
r= 7 
0 yy 
Since J > T > 0, we have 
N 
1 
Trsiior= > wy Tt Wew¥i = 1 — LP]. (4.60) 


i=1 


On the other hand, since J = ye 1, Yi, we have 


N N 
1 1 1 1 
Tr So(o)T = a Fae yee Y; = Fae Nn (4.61) 
Combining these two relations, we obtain (4.57). a 


Now, using (4.60) and (4.61) in the proof of Lemma 4.7, we show Lemma 4.2. 


Proof of Lemma 4.2 Here, we show Lemma 4.2 only when the maximum maxpep, (x) 


Th, W) exists. For the general case, see Exercise 4.28. Firstly, given s € (—oo, 0], 
the distribution 


Pi—s = argmax I_,(p, W) (4.62) 
pEPi(X) 


satisfies that 


x 


S 1 
rr i 
Tr wes es Pi-s(x’) wt”) < Tr (Sn (x’) wr’) (4.63) 


for any x € V™*”, That is, the state o)_.),,_, defined in (4.23) satisfies that 
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1, l-s 
I-s 
Ios os I- 
Tr We *ot_sp,, < ( Tr (x Pi-s (x) W, ) (4.64) 
Y 
Assume that a code 6” for the stationary memoryless quantum channel satisfies 
def 
N, = |®”| = e”®. Then, we have 


log Tr] (w? og) dp | = Sloe |W, m@ ‘tn. 
l=1 
n 1 : 
=> — log Tr| (W, wa) “Ot sin. | 
l=1 


n 

(a) 1 

<nlog ae Tr| (W, om) ‘Ot stn. | 
l= 


(nam) 


=n(1 — 5) log mS meow) : : (4.65) 


1 
l-s 
n 
=! 


(b) 
<n log >. 
L 


(a) follows from the concavity of x +> log x and (b) follows from (4.64). 
Now, we apply the discussion in the proof of Lemma 4.7 to the case with 0 = 


We have 


@n 
F1—s\pi_s" 


(1 — eo)! 2 (Irs, a SCTr Sse ITY 


(b) ~ ee 
< Try) *5,(02",,,,_.)° aoe KOU) i ail 


1 nil— rs 


S(t (= Pi—s(X) 2) (4.66) 


for s < 0, where (a) follows from (4.60) and (4.61) and (c) follows from (4.65). 
Here, (b) can be shown from the monotonicity (3.20) of the quantum relative Rényi 
entropy in the same way as (3.137). 

Thus, 


ail. 
1 l-s 
—log(1 — e[®]) < le (> pom) Eee (4.67) 
n : l-s 
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which implies (4.19). a 
Proof of Lemma 4.4 We show Lemma 4.4 only when the maximum maxpep..,.(%) 
rf @, W) exists. For the general case, see Exercise 4.29. Firstly, for s € (—oo, 0], 


def ae ; 
we choose p'_, = argmaXx, <p. _, (at ow W). Then, similar to (4.75), the relation 


1 
iS 


>) p@) Tr Wy (= Pp _.) wy”) : <Tr (= Pp _,.() wr") (4.68) 


holds for any p € Pc<x (4) *”'. That is, the state O1-s|p,_, Satisfies that 


1 l—s 


Tos 
> p@)Tr We o}_ay << | Tr > pees. wr”) (4.69) 


for any p € Pe<x (4). Then, using (4.69), we have 


n —s Ss = 1 —S_S 
log Tr] (We) at ap. | < nlog > — Tr] (W,9)' ae | 
l=1 
1 l-s 


l-s 
<nlog »(Sni..comt~) 


=n(1 — s) log m(Sn.cows) a (4.70) 


x 


Hence, similar to the proof of Lemma 4.2, we obtain (4.39). a 
Exercises 


4.14 Define J(p, 0, W) = > rex P(x)D(W,.||o). Show the following relations [12, 
13] including the existence of the minimums appearing the relations by following 
the steps below. 


sup I(p,W)= sup min J(p,o, W) 


pEePi(X) peP(X) 7ES(H) 
= min sup J(p,o,W)= min supD(W,||c). (4.71) 
oeSC0) pe PX) (P oeS(H) ey : 


(a) Show that mingesn) J(p, 0, W) = I(p, W). 

(b) Show that a  D(W,||o) is convex. (See (5.38).) 

(c) Show the existence of the minimums min,cs(1) SUPpe p(x) J (p,0, W) and 
MiNges(H) SUPye~ D(W,.||o) by using Lemma A.8. 

(d) Show (4.71) by applying Lemma A.9. 
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4.15 Give an alternative proof of (4.71) by following steps below when the maxi- 
mum maxpep,(v) /(p, W) exists. 

(a) Choose p; := argmax,,.p,.x)l(—p, W). Show that D(W,||W,,) < [(p1, W). 

(b) Show (4.71). 


4.16 Give an alternative proof of (4.12) by using (4.71). 


4.17 Similarly to (4.71), show the following relations including the existence of the 
minimums appearing the relations by using Lemma A.9. 


sup J(p,W)= _ sup min J(p,a, W) 


PEP rex (X) pePccx(X) PES (MY) 
= min sup J(p,o,W). (4.72) 
7ES(H) pe Pn (X) 


4.18 Give an alternative proof of (4.72) similar to Exercise 4.14. 


4.19 Let c, and cg be cost functions on %4 and %z. Show the following equation 
by using (4.71). 


Colca ten<k (W“ ® w?) = sre Coley<xi (W*) =F Colen<k—K' (W®) (4.73) 


4.20 Show (4.63) by using the function f(t) := Tr(tW!-* + (1 — 1) Dy Pi-s@) 
Wi) when the maximum maxpep,(v) [ ro: W) exists. 


4.21 Show (4.68) by using the function f(t) := Tr), (p@’) + (1 — Op{_,@)) 


1 . . 
Wi) when the maximum maxpep,..(x) / * AD W) exists. 


4.22 Show the following relations including the existence of the minimums appear- 
ing the relations for s € [—1, 1] \ {0}. 


Cy_(W)= sup min Ji_.(p,0, W) 
peP; (x) 76S) 


= min sup Jj_;(p,o,W)= min supD,_,(W,|lo). 4.74 
sani, SUP Ji-e(P.0, W) = min, sup Di_«(Wsllo) (4.74) 


Hint: Use the matrix convexity of x > x* for s € [—1, 0) and the matrix concavity 
of x  x* for s € (0, 1]. 


4.23 Show (4.17) by following the steps below. 

(a) Since the second equation of (4.17) was shown in Exercise 4.14 including the 
existence of the minimum min,es(1H) SUP,<x7 D(W;||o), it is sufficient to show the 
first equation. Show sup, I(p, W) < lim inf;_, sup, i, W). 

(b) Show that the convergence D,_;(W,||o) — D(W,||c) is uniform for x as s > 0 
when the supports of W, are included in that of o. (Hint: Use (c) of Exercise 3.5.) 
(c) Show sup, /(p, W) = lim sup,.,9 sup, / Y pi W) by using the final expression at 
(4.74) and o; := argmin, sup, D(W,.||o) (See (c) of Exercise 4.14). 
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4.24 Give an alternative proof of (4.65) for s € [—1, 0] by following the steps below. 
(a) Show the following by using (4.74). 


1 l—s 


sup TS pcwe ‘lot. sips < SUP m(Soeows) 


pePs(%) area 
(4.75) 
(b) Show (4.65). 
4.25 Show the following for s € [—1, 1] \ {0} by using (4.74). 
Ci_,(W4 @ W¥) = Ch(W4) + Cj_,(W?). (4.76) 


4.26 Show the following relations including the existence of the minimums appear- 
ing the relations for s € [—1, 1] \ {0}. 


Chee WS sup min Jj_,(p,0,W) 
_ PEP eax (X) PES) 


= min sup J\_;(p,0, W). (4.77) 
7ES(H) pe Pick (X) - 


4.27 Show (4.36) by following the steps below. 

(a) Since the second equation of (4.36) was shown in Exercise 4.17 including 

the existence of the minimum min,<s(1) SUP HEP. ( xy ID,9, W), it is sufficient 

to show the first equation. Show sup,<p__,(v)/(p, W) < lim infs_,0 sup, ep,_, (x) 
1-s@, W). 

(b) Show that the convergence Jj_;(p, 0, W) > J(p, 0, W) is uniform for pass > 0 

when the support of W,. are included in that of o for any element x in the support of 

p. (Hint: Use (c) of Exercise 3.5.) 

(c) Show sup,ep,_.(v) 1p, W) = lim sup,_, SUPpep,_.(x) Th, W) by using the 

final expression at (4.77) and oj := argmin, sup,<p__,(v) J(P, 7, W) (See Exer- 

cise 4.17). 


4.28 Show Lemma 4.2 in the following way when the maximum maxpep,,x) I ee 
(p, W) does not necessarily exist. 
(a) Choose a sequence of distributions {p,} such that limp. Ty_.@ns W)= 


SUP pep. (xy / I ,(p, W) and the matrix 5”, pn(x’)W) * converges as n — oo. So, 
we denote the limit of dw Pa(x’ wi * by Si_s, and denne the function f(t) := 


Tr(tw!- 54+ (1 —1r)S\_ jm s and the state o_, := sé Tr Ss *.. Show that 


Tr W!- sgt = Tr ss 5 (4.78) 


—s 


for any x € ¥ in th same way as Exercise 4.20. 
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(b) Show (4.19) by replacing o_,)),_, in the proof given in the main body by o}_-;. 
4.29 Show Lemma 4.4 in the following way when the maximum maxpep..,(x) 
vs ae, W) does not necessarily exist. 

(a) Choose a sequence of distributions {p,} such that limp. EF ie. W)= 
SUP pe P,<¢(X) T_.@, W) and the matrix 7, pa@’ wis converges as n — oo. So, 
we denote the limit of })., pn@’) wis by Sj_,, and define the function f(t) := 


1-s? 
Tr(t >, pe) W!-s + (1 — 1)S1_,) = and the state o_, := S'_./TrSi_,™. Show 
that 


1 


TrwWis)_ oF <Trsi_.= (4.79) 


for any x € ¥ in th same way as Exercise 4.21. 
(b) Show (4.39) by replacing o1_s\p,_, in the proof given in the main body by o1_,. 


4.30 Give an alternative proof of (4.70) by following the steps below when the 
minimum MaxpeP,..(v) L * tp: W) exists. 
(a) Show the following inequality for s € [—1, 0] by using (4.77). 


max Tr[ x wi oy , 
pePc<k(X) 2,Pe) - Isipi-s 


1 l-s 


T=s 
< max Tr wis . 4.80 
< mx, (So : ) (4.80) 


(b) Show (4.70). 


4.31 Define another c-q channel capacity under cost constraint by replacing the 


£W) 


condition that the maximum cost max; = 


is less than the given cost K by the 


, ae (M(oM(i)) . c 
alternative condition that the average cost x 3, oe) is less than the given cost 
K. Show that the modified c-q channel capacity under cost constraint is equal to the 
original c-q channel capacity under cost constraint following the steps below. 


: def 
(a) First, assume that c(xo) = 0 or redefine the cost as c(x) — c(xo), where xo = 
argmin,<yC(x). Let a code &” = (N,, yp, Y”) satisfy e[®™] > 0 and 2 >, 
ew o < K. For arbitrary 5 > 0, focus a code 649") — (Nay syn, GO, 
Y(+9")) satisfying Nason =N,, GOO (7) = p™ (i) ® ween, and pide 
= 
Show that there exist k @ [(1 — ras) Nal messages i), ..., ig such that c+”) 
(POT (i) SK. 


(b) Examine the subcode of eo” consisting of [(1 — ry) Nnl messages, and show 
that the rate of this subcode is asymptotically equal to = limy-+o0 1 log ||. 
(c) Show that the modified capacity is equal to the original capacity. (Note that this 


method gives the strong converse concerning the modified c-q channel capacity by 
combining the strong converse of the original capacity.) 
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4.32 Show the inequality C.(W',...,W™) < sup, mini<j<m I(p, W') following 
the steps below. 
(a) Show that there exists a distribution p for any distribution p™ on ¥” such that 


nl (p, W') > 1(:p™, (W1)™), i=1,...,M 


from (4.3) and (4.5). 
(b) Show the desired inequality using Fano inequality (2.35). 


4.7 Pseudoclassical Channels 


Finally, we treat the capacity of a c-q channel when the quantum correlation is not 
allowed in the measuring apparatus, again. In Sect. 4.2 we showed that the c-q channel 
capacity is not improved even when feedback and adaptive decoding are allowed in 
encoding as long as the quantum correlation is not used in the measuring apparatus. 
That is, the capacity can be attained when the optimal measurement with a single 
transmission is performed on each system. Then, we may ask, when does the c-q 
channel capacity C.(W) with individual measurements equal the channel capacity 
C.(W) with the quantum correlation in the measuring apparatus? The answer to this 
question is the subject of the following theorem. 


Theorem 4.4 (Fujiwara and Nagaoka [8]) Suppose that TrW,Wy» 40 for any 

x,x' € &X. Then, the following three conditions with respect to the c-q channel W are 

equivalent if X is compact. 

@_ There exists a distribution p € P¢(&) such that [W,, Wy] = 0 for any two ele- 
ments x, x’ € supp(p) and I(p, W) = C.(W). 

@ C(W)=C,.(W). _ 

@ There exists an integer n such that ew) = C.(W). 


A quantum channel W is called pseudoclassical if it satisfies the above conditions. 


Proof Since D>@ and @=>@ by inspection, we show that @=>@. The proof given 
below uses Theorems 3.6 and 4.5 (Naimark extension [14]). The proofs for these 
theorems will be given later. 


Theorem 4.5 (Naimark [14]) Given a POVM M = {M,,}u<q on Ha with a finite 
probability space 82, there exist a space Hg, a State py on Hg, and a PVM E = 
{Ev tweq in Ha @ He such that 


Tra pM, = Tra.p(p ® po)E., Vp € S(Ha), Vw € &. 


For the proof of Theorem 4.5, see Exercise 5.7 or the comments regarding 
Theorem 7.1. 
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Using Condition @, we choose a measurement M ™ on H®" and a distribution 
p” on X” such that 1(p(, M™, W) = C.(W). Since 


1p, MW) <1, W), CW) > nC(W) = 10, W), 


we obtain [(p™, M™, W™) = I(p,, W). This is equivalent to 


(n) M” ( 
> 2'@ (oeMs PMS — Dews?|W.2)) =0 
xesupp(p’) 


and we have DRM Pu ) = D(W” || Woon) for x € supp(p™), where x” is sim- 


Wein) 
plified to x. Following Theorem 4.5, we choose an additional system 7/4, a pure state 


pa on Hy, anda PVM E = {E;,} on the composite system H, ® 71 such that 


D(P We @pa Pi, en p(w" . pall Win S pa) 


According to Theorem 3.6, we take a real number a; (x) for everyx € supp(p”) such 


that the Hermitian matrix X, = >”, ax(x)E, satisfies W” @ pa = (ws @ pa) Xy 


Since X, is Hermitian, we obtain 


(W{” ® pa) (we ® ps) = (Wein ® pa)XxXx (win @ ps) 


= (we @ pa) Xp Xx (wir @ pa) = (ws @ pa) (W” ® pa) 


p” p™ 
for x, x’ € supp(p™). Therefore, we obtain 


Ww” wo? = wi? wo”, 


Defining p\” by p\” (x) = » p(x), from Exercise 1.23 we find that 


R= Oig Xn ):Xj=X 
W, and W, commute each other for any two elements x,y € supp(p\” ) because 
Tr W,W, 4 0 for any x, x’ € AX. Equation (4.5) yields 


Sp, W) = 1p, W) = nC.(W). 


i=1 


Therefore, [ (p”, W) = C,(W), and thus we obtain ©. | 
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4.8 Historical Note 


4.8.1 C-Q Channel Capacity 


Here, we briefly mention the history of the c-q channel coding theorem. Since this 
problem was independently formulated by several researchers, it is difficult to deter- 
mine who formulated it first. The first important achievement for this theorem is the 
inequality 


I(p, W) = I(M, p, W), (4.81) 


which was conjectured by Levitin [15] and proved by Holevo [2]. Indeed, this 
inequality can be proved easily from the monotonicity of the quantum relative 
entropy (5.36) [16, 17]; however, at that time, it had not been proved. During that 
period, a strong subadditivity of the von Neumann entropy (5.83) was proved by 
Lieb and Ruskai [18, 19]. Using the strong subadditivity, we can easily prove the 
above inequality™***; however, this relation between the strong subadditivity and 
inequality (4.81) was not known at that time. Combining inequality (4.81) with 
Fano’s inequality, Holevo [3] showed that the weaker version of the converse part, 
i.e., Co(W) < sup,ep,x) 1(p, W), held. Twenty years later, Ogawa and Nagaoka [6] 
proved the strong converse part C'(W) < SUP, <p. x) 1(p, W). Moreover, Nagaoka 
[20] invented a more simple proof of the strong converse. His proof is based on 
the relation with the hypothesis testing explained in the next subsection and the 
monotonicity of the Rényi entropy (3.20). In this book, we prove (3.20) using ele- 
mentary knowledge in Sect. 3.8 and give a proof of the strong converse part combining 
Nagaoka’s proof and (3.20). 

Regarding the direct part, in the late 1970s, Stratonovich and Vantsjan [21] treated 
the pure state case, i.e., the case in which all the states W, are pure. In this case, C.(W) 
is equal to sup, H(W,), but they found the lower bound sup, — log Tr Ww, of C.(W), 
i.e., they proved that sup, — log Tr Ww, < C,(W). Sixteen years later, Hausladen et al. 
[22] proved the attainability of sup, H(W,) in the pure-states case. This result was 
presented by Jozsa, who is a coauthor of this paper, in the QCMC’96 conference 
held at Hakone in Japan. Holevo attended this conference and extended this proof to 
the mixed-state case during his stay at Tamagawa University after this conference. 
Later, Schumacher and Westmoreland [5] independently obtained the same result. 
Their method was based on the conditional typical sequence, and its classical version 
appeared in Cover and Thomas [23]. Therefore, we can conclude that Holevo played 
a central role in the formulation of the c-q channel coding theorem. Hence, some 
researchers call the capacity C.(W) the Holevo capacity, while Theorem 4.1 is 
called the HSW theorem. Due to this achievement, Holevo received Shannon award 
in 2015, which is the most prestigious award in information theory. 

In the classical case, Csiszar and K6rner [24] have established the type method, 
which is a unified method in classical information theory and is partially summa- 
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rized in Sect. 2.4.1. Applying it to its classical version, the researchers obtained 
another proof of this theorem and examined channel coding in greater detail. Winter 
[25, 26] tried to apply the type method to c-q channels. He obtained another proof of 
the c-q channel coding theorem but could not obtain an analysis of the error exponents 
as precise as that by Csiszar and Korner. Since there is an ambiguity regarding the 
orthogonal basis in the quantum case, a simple application of the type method to the 
c-q channel is not as powerful as the application to the classical case. To resolve this 
problem, Hayashi [27] invented a different method for universal channel coding, and 
succeeded in giving the universal channel coding for c-q channel. Also, Bjelakovic 
and Boche [28] showed the same fact by another approach independently. 


4.8.2 Hypothesis Testing Approach 


As another unified method in classical information theory, Han [29] established 
the method of information spectrum. Verdi and Han [30] applied it to classical 
channel coding and succeeded in obtaining the capacity of a general sequence of 
classical channels without any assumption, e.g., stationary memoryless, etc. This 
result suggests the relation between channel coding and hypothesis testing. Based 
on the result, around 2000, Nagaoka proposed an idea to understand all of topics in 
information theory based on binary hypothesis testing in the classical and quantum 
setting. He considered that this idea holds without the independent and identical/ 
memoryless condition because the results of information spectrum [29] hold in the 
general sequence of information source and channels. As an evidence of this idea, 
he showed Lemma 4.7 in 2000 [20], whose classical case was shown by Polyanskiy 
et al. [31] as the meta converse theorem latter. This method much simplifies the proof 
of converse part. 

Further, Ogawa and Nagaoka [32] extended Verdti and Han’s method to the quan- 
tum case and obtained another proof of the direct part of the c-q channel coding the- 
orem. Their result also supports Nagaoka’s idea, i.e., clarifies the relation between 
the c-q channel coding and the quantum hypothesis testing. However, they could 
not obtain the capacity of the general sequence in the quantum case. Motivated by 
their proof, Hayashi and Nagaoka [11] derived Lemma 4.6, which more clarifies the 
relation between the c-q channel coding and the quantum hypothesis testing, whose 
classical case was shown by Polyanskiy et al. [31] as the dependent test (DT) bound 
latter. In this way, several fundamental results had been shown in the quantum case 
firstly. Then, a decade later, the classical cases were shown independently as special 
cases. Recently, many researchers of classical and quantum information theory are 
interested in this direction because this kind of hypothesis testing approach provides 
an unified viewpoint for information theory. Then, they produced many results to sup- 
port Nagaoka’s idea, i.e., they showed many results to clarify the relation between 
respective topics in classical and quantum information theory and the binary hypoth- 
esis testing. In particular, the second order asymptotic analysis has been discussed 
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in classical and quantum information theory, and this kind of hypothesis testing 
approach plays an essential role in the second order asymptotic analysis. 


4.8.3 Other Topics 


Moreover, we sometimes discuss the error exponential decreasing rate (error expo- 
nent) in channel coding. Burunashev and Holevo [33] first obtained the lower bound 
of the optimal error exponent in the pure-state case, which is equal to (4.56). Their 
method differs from the method of Exercise 4.11. In the mixed state case, combining 
the dependent test (DT) bound and Hoeffding bound, Hayashi [34] obtained the lower 
bound (4.55) of the optimal error exponent by the same method as Exercise 4.11. 
Dalai [35] derived an upper bound of the optimal error exponent as the quantum 
version of Sphere-Packing Bound. Unfortunately, the lower bound does not match 
the upper bound because the lower bound (4.55) is different from the tight bound 
even in the classical case. So, to obtain the tight bound for the optimal error exponent 
in the quantum case, we need to improve the lower bound (4.55). 

In addition, Fujiwara and Nagaoka [8] discussed coding protocols with adaptive 
decoding and feedback and obtained Theorem 4.2. They also introduced pseudoclas- 
sical channels (Sect. 4.7) and obtained the equivalence of Conditions © and @ in 
Theorem 4.4. This textbook slightly improves their proof and proves the equivalence 
among the three Conditions ©, @, and ®. Bennett et al. [36] obtained an interesting 
result regarding the classical capacity with feedback and quantum correlation. The 
c-q channel capacity with a cost constraint was first treated by Holevo [10], and its 
strong converse part was shown by Hayashi and Nagaoka [11]. 

On the other hand, Stratonovich and Vantsjan [21] found the result of — log Tr Ww; 
and not H(W,) due to some weak evaluations with respect to the error probability 
for the pure-state case. It would be interesting to determine the difference between 
these two quantities. Fujiwara [37] considered an ensemble of pure states generated 
randomly under the asymptotic setting. He focused on two types of orthogonality 
relations and found that the two quantities H(W,) and — log Tr Ww; correspond to 
their respective orthogonality relations. 


4.9 Solutions of Exercises 


Exercise 4.1 Since H(p|u)(u| + (1 — p)|v)(v|) = H(A — p)|u)(u| + piv) (v]), the 
concavity implies H(1/2|u)(u| + 1/2|v)(v|) > ACU — p)|u) (ul + piv) (v|). Show 
that the larger eigenvalue of 1/2|u) (u| + 1/2|v)(v| is Hell L 


Exercise 4.2 


I(pa, W*) + I(pg, W®) — Ip, W4 @ W®) 
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= > palxapa ee) DWA @ WEIWA @ WE) 


XA.XB 
— D2 Pa, xa) DWE ® WH IWy, @ Wp) 
XAsXB 
+ >) PCa, x0) D(Wh ® WEIW) ® Wh) 
XAsXB 
— > pa, x8)D(WA @ WEI (W4 @ W*) 
XA XB 
= > (paxa)pa(xe) — p(xa, Xp)) (D(W4 || WA) + D(W2 |W?) 
A PA B PB 
XA.XB 
+ pts sm) (—Te(We @ Wi) tog (WA, @ Wf) 
XAsXB 


+Tr(W2. @ WS) log (W* @ W*),) 


=D ((w*@ Ww"), |Wa @ We) = 0. 


Pp | PA 


Exercise 4.3 It is sufficient to show that 


1 l—s 
T-s 
TW! oc = {Ti > wis , Ws € [0,1] (4.82 
el (Zoom) is 
oo l—s 
l-s 
i > TW! oc = {Ti > wis , Ws € (—oo, 0]. 
west) Sa) TEs 7 +( peo) 5 € (—oo, 0] 


(4.83) 


Using the matrix Holder inequality (A.26) | Tr XY| < (IrX co (Tr Y : )’, we have 


1l-s 


Tr D POW, “0! ={Tr (Spo) : (Tr(o*)*)° 


1l-s 


=| Tr (rem) 


The equality holds when o = (7, p(x)W!-*) = / Tr(S, p(x) W!) =. Hence, we 
obtain (4.82). 

Equation (4.83) can be shown similarly by replacing the role of the matrix Hélder 
inequality (A.26) by the reverse matrix Hélder inequality (A.28). 
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Exercise 4.4 The relation (4.16) can be shown as follows. 


dsI\_;(p, W 
lim 1\_,(p, W) = _ ash-s(p, W) 
s>0 ds 


>, P(x) Tr W! * log W, — log W,) Ws 
7 Di PX) Tr WW 
= > p(x) Tr W, (log W, — log W,) = I(p, W). 


x 


|s=0 


ls=0 


The relation (4.24) can be shown as follows. 


~(1—s)log Tr (S, p@) WI) 
S 


lim Z}_,(p, W) = lim 
s>0 : s>0 
1 
T-s 


|s=0 


fin OST (S POW!) dlog Tr (©, po)W!™) 
=hm —— 


s>0 Ss ds 
— Te he (Sap@yw!') * log (D, pW!) 
7 Tr (X, pow!) 
(Epa ws) E.pe)W) "los W, 
Tr (Pp) WI) 


--—Tr (x p(x) w.) log (= noo) +Tr >) p(x) W, log W, = I(p, W). 


x 


|s=0 


|s=0 


x 


Exercise 4.5 Equation (4.22) implies (4.25). 


Exercise 4.6 Since all of W,, are commutative with each other, we can denote W,. by 
Dy W,.0) |y) (y|. For s € [—1, 0), the Holder inequality (A.25) implies that 


ehi-sP.W) > p@) De W. Gy W, (y)s 


xy y 
=>) > p&)'*W.0) 0) WO) W, 'Q)) 
y x 


—s 


(Xwmorws'oy-r*) 
(Xpowow,'or) 


+s 


< ¥(Teetmon) 
y x 

= ¥(Leetmon) 
y x 

= ¥(Lee'vmon) 


y 
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l+s 
_ -sI*, (pW) 
= (Spe =e “ns : 
x 


Replacing the Hélder inequality (A.25) by the reverse Holder inequality (A.27), we 
can show the case with s € (0, 1]. 


Exercise 4.7 The < part in (4.33) follows from /(M, p, W) < I(p, W). Use the Fano 
inequality noting the definition of C(W) for the proof of the > part. 


Exercise 4.8 It is enough to show that min,-1;¢e<x H(o) = H(pe,x). Define the 
state p := e “/ Tre“. Fora given state 7, we choose the state p, such that H(c) = 
H(p,). Due to Exercise 3.9, — Tr cE + log Tr e~® — H(c) = D(o||p) > D(ps\|p) = 
—TrpsE +logTre“ — H(p;), which implies that TroE < Trp;E. Hence, 
MaXy-1; cE<K H(o) = MaXs-Tr p.E<K (ps). Since H (p,) = w(s|p) —_ see, aio.) = 
—sf0Glo < 0. Thus, H(p;) is monotonically decreasing for s. On the other hand, 
4 Tr psE = —Tr psE” + (Tr psE) < 0. Thus, Tr p,E is monotonically decreasing 
for s. Therefore, max,-t; p,.e<xK H(ps) is realized when maximum energy Tr p,E is 
realized. That is, the state pg,x realizes the maximum entropy max;-1; p,.7¢<k H(ps). 


Exercise 4.9 From (A — cB)*(A — cB) > Owehave A*B + B*A < c7!A*A+cB*B. 
Exercise 4.10 Consider the case of c= /(@/(a@+ £). 


Exercise 4.11 

(a) Equation (4.55) follows from (4.49). 

(b) In this case, —sli_5(p, W) = log >. P@) Tr W, Ww,” = log Tr W, wv = log Tr 
wi = sH\_;(W,). Hence, (4.55) yields (4.56). 


Exercise 4.12 Order the N,, signals from smallest to largest, and note that the error 
probability of the first N,,/2 signals is less than twice the average error probability. 


Exercise 4.13 


(a) Applying the Markov inequality (2.158) to (4.49) for each channel ( Ww)”, we 
have 


Px{elox] >M. een < a 


for i=1,...,M. Then, there exists a code 6” with size e”® such that e[®™] < 
M2!+5enG6lWiP)+SR) for j = 1,...,M. 

(b) When R < sup, minj<j<m I(p, W') and p realizes sup, minj<j<y I(p, W'), the 
quantity max; M2!+%e"@6IW'.P)+58) goes to zero. 


Exercise 4.14 


(a) The inequality J(p, 0, W) —I(p, W) = D(W,||c) = 0 holds. The equality con- 
dition is a = W,. 
(b) Since x > — log x is matrix convex, gd > D(W,||c) is convex. 


186 4 Classical-Quantum Channel Coding (Message Transmission) 


(c) Assume that o is not full-rank. Since there exists an element x such that the sup- 
port o contains the support W,, sup, D(W,||a7) = oo. So, when we define the func- 
tion g +> sup, D(W,||o) on the set of full rank densities, it satisfies the condition 
of Lemma A.8. So, Lemma A.8 implies the existence of ming <s(71) SUP, <7 D(Wyl|lo), 
Since sup,cp,(xv) J (p, 7, W) = supyex D(W;||o), the minimum ming. s(n) SUPpepP.(x) 
J(p, 7, W) also exists. 

(d) Since a + J(p, o, W) is convex and p +> J(p, a, W) is linear, Lemma A.9 guar- 
antees that sup,<p.v) MiNgescn) J (P, 7, W) = Minge s(n) SUP, ep,(x) JP, 7, W). 
Since SUP cP(x) J(p, 7, W) = sup, D(W,||o), we obtain (4.71). 


Exercise 4.15 


(a) Define the function f(t) := tD(W,|| —1)W,, +tWy) + ox’ — Opi) 
D(Wy||(1 — t)W,, + tW,). We have 0 < £0) = D(W,I|W,,) — © x’pi x’) D(We | 
W,,) + Tr(—W,, a W,) = D(W,|| W,,) a > <r (x) DW | W,,)- 

(b) 


sup I(p,W)= sup min J(p,o, W) 
pePe(X) pePr(X) TESA) 


< min sup J(p,o,W)= min supD(W,||c) 
a€S(H) pEP(X) cES(H) vex 


< sup D(W,||W,,) < 1(p1, W). 
xEX 


Exercise 4.16 Since C.(W4 @ W®) > C.(W4) + C,(W%), it is enough to show the 
opposite inequality. We will show the inequality only when there exist maxpep,(2,) 
I(p, W*) and MAXpeP, (x3) 1(P, W®). However, the general case can be shown simi- 


def def 
larly. Choose pa.1 = argmax,<p,.x,)/(p, W“) and pp. = argmax,<p,.x,)/(p, W"). 
Then, (4.71) implies 


C.(W4 @ W®) < sup = D(WA @WEIWS @we ) 


PAA PBA 
(X4.Xp) EX xX KB 


= sup D(We|We )+D(WEIWE ) =C.(W4) + C.(W*), 


PAA PBA 
(xa XB )EXA x Kp 


where (a) follows from (4.84). 


Exercise 4.17 We can show the existence of the minimum minges(H) SUPyep._¢ (x) 
J(p,o, W) in the same way as minges(H) sup, D(W;||o) by using Lemma A.8. 
Replacing P;(¥) by Pe<x(), we can show (4.72) in the same way as (4.71). 


Exercise 4.18 Choose p; := argmax,-p._,.)/(p, W). For a distribution p € Pe<x 
(¥), we define the function f(t) := t>x(1 — t)p@)DW,||0 — 1) Wp, + tW,) + 
dx = Np} (x)D(W,||(1 — )W,, + tW,). We have 0 < £)(0) = > xp(x)D(Wyll 
Wo) — xP O)D(Well Wy) + Te(—Wy + Wp) = do xp@)D(Wall| Wy) — Do xP) 
(x)D(W,|| Wp: ), which implies J(p, Wy, W) < I(p',, W).Thus, 
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sup I(p,W)= sup min J(p,o, W) 


pePr<k(X) pePrex(X) 7S (H) 
< min sup J(p,o,W)< sup J(p, Wy, W) <1(p}, W). 
FES(H) peP.cx (X) pePrex(X) 


Exercise 4.19 For a given distribution p on 4 x Vg, we denote the marginal distri- 
butions of p by p“ and p®. Then, we have 


J(p*, 04, W*) + IQ", a8, W") — IQ, o @ on, W* ® W*) 
=D((W* @ W*), |W @ W2) > 0. (4.84) 
Since Cele, tep<x(W4 @ W®) > maxx: Cocy<x(W4) + Corcy<x—x' (W®), itis enough 
to show the opposite inequality. We will show the inequality only when there exist 
MAXpeP,, -¢(X,) [(P, WA), Maxpep,, <e(Xy) (VD, W*), and maxpep,, ,., <x (Xs) [(p, W4* ® 
W®) for any K. However, the general case can be shown similarly. Choose pg.1 = 
argmax (X4 x Xp)I(p, W4 @ W®). Let Ky and Kg be the averages of c, and 


PEP cy teg<K 
i eee def 
Cp under the joint distribution p4g.;. Choose p41 = argmax (Xa) (—p, W) 


PEP cy =K, 


and pp.1 = argMaX ep, _x, (x)! D: W®). Then, (4.72) implies 


Co tusee(W ® w?) 
< sup J(p, WA @ We 


PAA PBA’ 
DEP ca +cp<k (XA X Xp) 
(a) 
A A A B B B 
< sup JM, We Wo) +I@", Wi Ww) 
DEP cy +ep<k (XA Xx Xp) 


< ne Celeacx’ (W“) ar Celce<k—K'(W*), 


W4 @ WwW?) 


where p“ and p® are marginal distributions of p, and (a) follows from (4.84). 


Exercise 4.20 First, notice that af (0) < 0. Using the matrix 


Ss 


-1 
T=s I-s 
Fiosips = (Xan) = (x pt) 


Ss 


1 
T-s 
: »(Sn..comy) oh ap 
Y 
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we have 


df 1 2 -_ 
2 o- (= We F1-s1p., — DoPi-s(e) Tr Wy ‘rin. 


Tr Wy 8 1-sips-. 
[1 (i riakIWe) iS ] 
< YP Tr WG... 
- [t (Spi w,”) a 
_ (Se is) Wy) (Sey Pi-s@) We) oe 
: [ (Se Pia@)W,*) al 


l-s__s -_ 
Tr Wy "O1-sip._, = 


1 ty, I-s 
irs pia ws ys nadie 
_ J" 5= (Tr Da eow 
[1 (Sy Pi-s(@’) Wy) al x! 
which implies (4.63). 
Exercise 4.21 Since the matrix 
w-l 
O1-sp\_, = (Se. 5’) Wy ‘) 
satisfies 
df us 
<70= Elo "Fis, — DA. <) Te WES 1—sp,_, J 
we have 
Tr>. (Wo sip? 
> p@) Tr eee = >: P 1 ae : 
. ies , , l—s\ Ts 
x [7 (Sy Pi_s@) Wy ) | 
1 1l-s 


/ y l-s ~ I-s 

ye Pig) Te Woo sip' 

<2 il = ae ee > p(x’) wis : 
[7 (Sv Ps) We) =| 


which implies (4.68). 
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Exercise 4.22 We can show the existence of the minimums minges(1) SUPyey Di-s 
(W,.||o) and minges(71) SUPpe P(X) J\-s(p, 7, W) in the same way as min,es(H) SUP, 
D(W,||o) by using Lemma A.8. (4.22) implies the first equation in (4.74). Also, the 
final equation in (4.74) follows from 


inf Tr[ wi] * = inf Tr W.~] * for Vs € (0, 1 
Pelee LLP) . |e inf r| W, *|o° for Vs € (0, 1] 


sup 1 poo ‘|o" = sup Tr] Wy ‘|o* for Vs € [—1, 0). 
PEP; (X) % 


Hence, it is sufficient to show the following: 


max inf a pd P(x) wi] o*= inf max up2 px)W.- ‘| 


o pEP(X anes a€S(H) 
for Vs € (0, 1] (4.85) 
min sup Tr[ pixyWi-*|o* = sup min Tr[ p(x) Ws ‘|e . 
7ES(H) pe P(X) d : pePy(X) 7 d 
for Ws € [—1, 0). (4.86) 


Since the function x +> x* is matrix concave fors € (0, 1),a Ty p(x) Wwi-']o° 
is convex. Hence, Lemma A.9 yields (4.85). Similarly, since the function x +> x° is 
matrix convex for s € [—1,0),7 ' Tr b= p(x) Ww! le is convex. Hence, Lemma 
A.9 yields (4.86). 


Exercise 4.23 

(a) Since [;_,(p’, W) < sup, LE . W), we have I(p’, W) = liminf,_,9 ;_,(p’, W) 
< lim inf,_,9 sup, Tip, W). Taking the supremum for p’, we have sup,, Ic’, W) < 
lim inf. sup, [j_,(p, W). 

(b) For any s and x, there exists a parameter S(s) between s and O such that 
(S| Welle) = d(C| Wella) + 56'(0|Wello) + =" (0|Wello), i.e., Di-s(Wz||o) = 
(W,||o) — 56" (S(s)|W,||o). Since (c) of Exercise 3.5 guarantees that 


Tr W! ‘(logo — log W,)o* (log a — log W,) 
Tr p!-o8 
(Tr W! ‘a5 (log o — log W,))? 
(Tr W!-so05)? 


$" (3(s)|Wello) = 


’ 


the quantity sup,_;_, - sup, ¢”(s| W,||~) exists with sufficiently small € > 0. So, the 
convergence D\_;(W,||o) — D(W,||o) is uniform for x. 

(c) The state o, satisfies the condition in (b). Since min, sup, Di_s5(W,||o) < 
sup, Di_;(W,||o1), we have lim sup, _,9 min, sup, Dj_5(W,||o) < lims_,o sup, Di_s 
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(W,||o1) = sup, D(W,||o1) = min, sup, D(W,||o). Hence, (4.71) and (4.74) imply 
sup, [(p, W) = lim sup, ,o sup, I. W). 

Exercise 4.24 

(a) Since oj~s\p,_, gives the minimum min, T1>, Pi—s(x) wi] o°* due to (4.83), 


(4.86) implies (4.75). 
(b) The inequality (4.65) follows as 


log Tr cw (n) oe | = »; log Tr] (Wm @) "tn. 
l=1 


a4 —— 
= “log Tr| (W049)! ‘ot sin. 
l=1 
(a) : 1 l-s__s 
<n log > = Tr| (Wong) 1 stn..| 
l=1 


1 =f S 
=n log > (We)! lots 
I=1 


l-s Ss 
<nlog max “bz p(x) (Wy) lots. 


Ail 
(b) = 
<n(1 — s) log | Tr (pow) F 


x 


where (a) follows from the concavity of x +> logx and (b) follows from (4.75). 


Exercise 4.25 This exercise can be solved by the same way as Exercise 4.16 by 
replacing the role of (4.71) by that of (4.74). 


Exercise 4.26 We can show the existence of the minimums MiNges(H) SUPpep,_¢(X) 
J\-s(p, 7, W) in the same way as mingesH) Sup, D(W,||o) by using Lemma A.8. 
Due to (4.22), to show (4.74), it is sufficient to show the following: 


f Tr[ wi] s 
rnc P ay : DP |e 


— max Tr 
caer es o€S(H) 


| 
min sup Tr os D(x) Ww. ‘|o" on 
m| |e" 


DP) Ws ‘! Ws e€ (0, 1], (4.87) 


TESA) neP.ck(X) p 


2D Pe) Ws : 


= sup. min 
peP. ee 7eS(H) 


ao ‘Vs €[-1,0). (4.88) 
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Since the function x +» x* is matrix concave fors € (0, 1),0 Ty p(x) Wwi-'|o° 
is convex. Hence, Lemma A.9 yields (4.87). Similarly, since the function x }> x° is 
matrix convex for s € [—1,0),7 th Tr > p(x) Ww! =tla8 is convex. Hence, Lemma 
A.9 yields (4.88). 


Exercise 4.27 


(a) Since }_;(p’, W) < SUPpeP, x (X) TP. W) for p’ € Pecx (4), wehavel(p’, W) 
= lim inf, 9 _;(p’, W) < liminf,_,9 SUPpeP,-g (X) Pp, W). Taking the supre- 
mum for p’, we have sup, ep__,¢x) 1(p', W) < lim info sup,ep._..¥) Th_,(p, W). 
(b) We replace ¢(s|W,||o) by —sJ,_;(p, 7, W) in the proof of (b) of Exercise 4.23. 
Then, we can show the desired argument. 

(c) The state o/ satisfies the condition in (b). Since min, SUP pe Pg (X) Ji-s(p, 0, W) 
S SUPpep,_¢ (x) JI-sP, oa, W), we have 7 

lim sup,_,9 MiNg SUP,ep,_,(xv) JI-s(P, 7, W) < lims-0 SUPpep,_, (x) JI-s(P, 71, W) 
= SUPpep, g(x) J (P, 71, W) = ming SUP,ep,_,(x) J (VP. 7, W). Hence, (4.72) and 
(4.77) imply SUPpeP ck (X) I(p, W) = lim sup,_,o SUP pe Px (X) 1 ap W). 


Exercise 4.28 


: s ; a. L-1 ss 
(a) First, notice that £0) < 0. Using the matrix o)_, := S;", = S;", = aj_,, we 
have 


0<F m=, ! Toy G1 » — Tr S}_sG1-5). 


Hence, 


Tr Wi a1, 2 Tr S|_sOp,_, 


Tr Wi ot_, = = a 
oe S ik [Tr S$ fae 


Tr 5\_,S7, Trsi ae Ts\l—s 
= = = 7 = (Tr Si) , 
[Tr S;*]° [Tr $i-4]* 


which implies (4.78). 
(b) Inequality (4.78) shown in (a) implies that 


Tr W, Sot_, < Ur Sie Ti ylos 
Then, we can show that 


=e 
ais Tr Wo) “es < n(1 — s) log(Tr ${=5) 


in the same way as (4.65). Hence, we can show that 
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1 
| log( —e[6™)) < log(TrS™= y+ aR 


in the same way as (4.67). So, we obtain (4.19). 
Exercise 4.29 


oe ee 
(a) Since the matrix G\_, := Si_,™~" satisfies 


df ed ' Gg 
0<—O= (em Tr Wy °34_, — TS}... 


we have 


Tr>, p(x)W)- ‘ols 
[Trsi_.]}s 


2pe)T W.*o,, = 


_Tt Si ae “5 
TTS’ rays 


— l-s 
=(TrS|_,7), 


—s 


which implies (4.79). 
(b) Inequality (4.79) shown in (a) implies that 


> po) Tr Wo} < (Try. 
for p € Pe<x(#). Then, we can show that 


los T|( Wingy) *o}- | <n(1—s) log(Tr S_,™) 


in the same way as (4.70). Hence, we obtain (4.39). 
Exercise 4.30 
(a) Since o\_s,p;_, gives the minimum of min, Tr ba Ps) wi-s|o° due to (4.88), 


we have (4.80). 
(b) The inequality (4.65) follows as 


Sea FO 7 (@) . 1 = 8 at 
log Tr[ (wi Og ): ots, | < nilog ox 7 Tr| (Wong)! ere 
l=1 


=n log >. (W Om) aie 
i=l 


< : | Ss ; 
<nlog _max, TS pow. Ot _sip! 


—s 
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ail: 
(b) ™ 
<n(1 — s) log { Tr 63 P_,) wr") ; (4.89) 


where (a) follows form the concavity of x +> logx and (b) follows from (4.80). 
Exercise 4.31 


(a) Apply the Markov inequality to the uniform distribution on the message set 
{1, se Nn}. 


Exercise 4.32 


(a) We denote the j-th marginal distribution of p by p;. Then, (4.5) yields 
that 5°", 1(p;, W') = Tp, (W‘)). Equation (4.3) yields that 5°", 1(p;, W') 
<nl(\_, 4pj, W'). Hence, the distribution >", +p; satisfies the desired 
condition. 

(b) Consider an encoder 6) with the size N,, and the decoder Y). Choose the dis- 
tribution p™ on X” as p™ (x) = = forx € Im &™ and p™ (x) = 0 forx ¢ Im d™. 
Since the error is given as the maximum value for the choice of the channels 
W!,..., W™, Fano inequality (2.35) implies that 


log2 + e[6™ ] log N, > logN, —1(Y, p™, (W')) (4.90) 
fori = 1,...,M. Then, choosing the distribution p on ¥ according to (a), we have 
TY, pp, (W)™) < 1p, (W')™) < nl, W') (4.91) 


Thus, we obtain 
1 F (n) 1 : i 
—(log2 + e[@ ] log N,) => — log N, — minI(p, W’). (4.92) 
n n I 
Taking the limit n — oo, we have 
evel ; 
lim —logN, < min/(p, W'), (4.93) 
n>o n i 


That is, we have C.(W!,..., W”) < sup, mini <i<m 1 (p, W’). 
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Chapter 5 
State Evolution and Trace-Preserving 
Completely Positive Maps 


Abstract Until now, we have considered only quantum states and quantum measure- 
ment as quantum concepts. In order to prefer information processing with quantum 
systems, we should manipulate a wider class of state operations. This chapter exam- 
ines what kinds of operations are allowed on quantum systems. The properties of 
these operations will also be examined. 


5.1 Description of State Evolution in Quantum Systems 


The time evolution over a time ¢ of a closed quantum-mechanical system 1 is given 
by 


itH , ,—itH 
pre" pe ; 


where H is a Hermitian matrix in 1 called the Hamiltonian. However, this is true 
only if there is no interaction between the system 7/ and another system. The state 
evolution in the presence of an interaction cannot be written in the above way. Fur- 
thermore, the input system for information processing (i.e., state evolution) is not 
necessarily the same as its output system. In fact, in some processes it is crucial 
for the input and output systems to be different. Hence, we will denote the input 
and output system by 7/4 and 7/z, respectively, and investigate the map « from the 
set S(H,) of densities on the system 7/4 to S(H g), which gives the relationship 


between the input and the output (state evolution). First, we require the map « to 
satisfy the condition 


K(Api + (1 — A)p2) = AK(p1) + Cl — A)K(p2) 


forl > A > Oandarbitrary p;, p2 € S(H 4). Maps satisfying this property are called 
affine maps. Since the space S(H.,) is not a linear space, we cannot claim that x is 
linear; however, these two conditions are almost equivalent. In fact, we may extend 
the map « to a linear map & that maps from the linear space T(H,) of Hermitian 
matrices on 1, to the linear space T(Hg) of the Hermitian matrices on Hg as 
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follows. Since an arbitrary matrix X € JT (H,) can be written as a linear sum 


> ae ai éR (5.1) 


by using elements X|,..., X42 of S(H4), the map & may be defined as 
~ def i 
R(X) = >° ai n(X)). 


The affine property guarantees that this definition does not depend on (5.1). Hence- 
forth, we shall identify & with «. The linear combination of the elements in T (H,) 
multiplied by complex constants gives the space M(H4) of the matrices on Ha. 
Since any element of M(H ,) can be written as Z = X + iY with two Hermitian 
matrices X and Y, « may be extended to a map from the space M(H,) of matrices 
on Ha, to the space M(Hg) of matrices on Hg as K(X +iY) := K(X) +ik(Y), 
which satisfies K(Z*) = «(Z)*. It is often more convenient to regard « as a linear 
map in discussions on its properties; hence, we will often use « as the linear map 
from JT (H,) to TJ (Hg). Occasionally, it is even more convenient to treat « as a map 
from M(H,) to M(H 8) We shall examine these cases explicitly. 

In order to recover the map from S(H,) to S(Hg) from the linear map « from 
T (Ha) to T (Hg), we assume that the linear map transforms positive semidefinite 
matrices to positive semidefinite matrices. This map is called a positive map. The 
trace also needs to be preserved. 

However, there are still more conditions that the state evolution must satisfy. In 
fact, we consider the state evolution « occurring on the quantum system 74 whose 
state is entangled with another system C”. We also suppose that the additional system 
C” is stationary and has no state evolution. Then, any state on the composite system 
of C” and 1, obeys the state evolution from 714 ® C” to Hg ® C”, which is given 
by the linear map & ® t, from T(H, ® C") = T(Ha) @ T(C”) toT (Hg @C") = 
T (Hg) ® T(C"), where 1, denotes the identity operator from T (C”) to itself. The 
map & ® l, then must satisfy positivity and trace-preserving properties, as discussed 
above. In this case, the system C” is called the reference system. The trace-preserving 
property follows from the trace-preserving property of kK: 


Tr(k ® (X Xj ® ‘/) = Sot (K(X;) ® tn(Vi)) 


= Tra) The) = > Trexy Try; -o(Sx xi) 


However, as shown by the counterexample given in Example 5.7 of Sect. 5.2, it is not 
possible to deduce that & ® 4, is a positive map from the fact that « is a positive map. 
In a composite system involving an n-dimensional reference system C”, the map & 
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is called an n-positive map if « ® 1, is a positive map. If « is an n-positive map for 
arbitrary n, the map « is called a completely positive map, which we abbreviate 
to CP map. Since the trace of a density matrix is always 1, the state evolution of a 
quantum system is given by a trace-preserving completely positive map, which is 
abbreviated to TP-CP map. It is currently believed that it is, in principle, possible 
to produce state evolutions corresponding to arbitrary TP-CP maps, as is shown by 
Theorem 5.1 discussed later. If a channel has a quantum input system as well as 
a quantum output system, it can be represented by a TP-CP map. Such channels 
are called quantum—quantum channels to distinguish them from classical-quantum 
channels. Strictly speaking, « is a linear map from TJ (H,) to TJ (Hg); however, for 
simplicity, we will call ita TP-CP map from the quantum system 7/4 to the quantum 
system 7/,. In particular, since the case with the pure input state is important, we 
abbreviate &(|x)(x|) to K(x). 
In fact, it is often convenient to discuss the adjoint map «* defined as 


Tr«(X)¥* =Tr XK*(Y)*, VX € M(Ha), VY € M(Ha). (5.2) 


That is, «* can be regarded as the dual map with respect to the inner product (X, Y) “ 


Tr XY*. Then, the trace-preserving property of « can be translated to identity- 
preserving property of «*. That is, « is trace-preserving if and only if «* is identity- 
preserving due to (5.2) with Y = J***', 

A map & is positive if and only if the adjoint map &* is positive because their 
conditions are written as Tr «(X)Y = Tr Xx«*(Y) > 0 for any non-negative matrices 
X €T(H,) and Y € T(Hz). Similarly, a map « is n-positive if and only if the 
adjoint map «* is n-positive. Therefore, the completely positivity for « is equivalent 
to that for «*. Therefore, we can discuss the adjoint map «* instead of the original 
map kK. 

Before proceeding to analysis on completely positive maps, we discuss 2-positive 
K(A) K(B*) 
K(B) K(C) 


A, B, and C satisfying the matrix inequality G : ) > 0 in T(H, ® C’). Now, 


maps. A map & is 2-positive if and only if ( ) > 0 for any matrices 


we assume that an identity-preserving map k* is 2-positive. Since ( x l= 


(7) (XT) = 0, we find that oes ie rae - Cc 7) > 0, 
which implies 


Exe. 5.2 


K*(X*X) > K(X) (X)* = K(X)" (X"). (5.3) 


We now give the matrix representation of the linear map from 7 (H,) to T (Hg) 
and a necessary and sufficient conditions for it to be a TP-CP map. We denote the basis 
of the quantum systems H4 and Hz, by e#,..., ef ande?,..., e%, respectively. We 
define K («) as a matrix in H4 ® Hg, for « according to 
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pus 


K(n)49-O9 & (eB ln(let)(e4 |)le?). (5.4) 


Let X =>); ; xd lef AvMeALY = Dar yxleP) (eP |. Then, we can write 


Tre(X)¥ = Do xy (ef le(lef) (e4 leg) = Tr(X @ Y7)K (x). 
i,j,kl 


Now, let HR i the apace spanned by ef, ..., a. Then, the maximally entangled 
state |By) = a ‘i >, e/ ® e® characterizes this matrix representation of « as 


(« ® tr) (|Pa) (Pal) 
=e (#1) @ lef )(e 5 > Kes Ome eee 15) 


d 


Combining these equations, we have the following representation of the output state 
K(p) = Trrd(k @ tr) (|®a) (Pal) Ue ® p’), (5.6) 


where p’ is regarded as a state on the reference system Hx while p is a state on 
Ha. In the following, we omit vp, i.e., abbreviate (K ® ur) to kK. Also, Ip ® p! 
simplified to p’. 

The definition of the matrix K («) can be naturally extended to a map from M(H) 
to M(H,), as discussed above. Since the matrix K («) uniquely characterizes the TP- 
CP map &, as explained in the following theorem, it is called the Choi-Jamiotkowski 
matrix of kK. 

Note that d (d’) is the dimension of 74 (Hg) above. K(«) may be used to 
characterize the TP-CP map as follows. 


Theorem 5.1 The following conditions are equivalent for a linear map k from 
T (Ha) toT (Hp) [1-5]. 


& is a TP-CP map. 

The map «* is a completely positive map and satisfies K* (Ip) = Ia. 

k is a trace-preserving min{d, d'}-positive map. 

The matrix K («) in H, ® He is positive semidefinite and satisfies Trg K(k) = 
Ty. 

(Stinespring representation) There exists a Hilbert space Hc identical to H.,, 
a pure state po € S(Hg ® Hc), and a unitary matrix U,, in Hs ® Hg ® He 
such that K(p) = Tra,c Ux(p ® po) U;. Note that the structure of Hc depends 
only on Hx, not on k. Only U,, depends on k itself. 

© (Choi-Kraus representation) /t is possible to express k as K(p) = >; F; pF; 
using >; F}' F; = In, where F,..., Fag are a set of dd' linear maps from H 
to Hp. 


CRE ORO ROS) 
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The above conditions are also equivalent to a modified Condition © (which we call 
Condition © ), where po is not necessarily a pure state, and the dimension of Hc is 
arbitrary. Another equivalent condition is a modification of Condition © (which we 
call Condition ©), where the number of linear maps { F;} is arbitrary. 


This theorem will be shown in Sect.5.7. If the input system 7/4 and the output 
system 71, are identical to C4, the channel is called a d-dimensional channel. In 
this case, the Stinespring representation can be rewritten as follows. 


Corollary 5.1 The following conditions are equivalent for a linear map k from 


T (Ha) to T (Ha). 


@® «is a TP-CP linear map. 
@ K(p) = Tre V.(p ® po) V,* for a quantum system He, a state pp € S(HE), and 
an appropriate unitary matrix V,, in Ha ® He for k. 


It is possible to make the dimension of Hg less than d?. 


The triplet (7c, po, U;,) in @ of Theorem 5.1 is called the Stinespring represen- 
tation. The equivalence to © has been proved by Stinespring for the dual map x«* 
under general conditions. 

The Stinespring representation is important not only as a mathematical represen- 
tation theorem, but also in terms of its physical meaning. When the input system is 
identical to the output system, as in Corollary 5.1, we can interpret it as a time evo- 
lution under an interaction with an external system 7{¢. The system 7/, is therefore 
called the environment. 

When the map « describes a quantum communication channel and the input and 
output systems are different, 7c can be regarded as the communication medium as 
in Fig.5.1. Since the input system 7/4 and the communication medium 7{¢ can be 
regarded as parts of the environment of 7/g, we may regard 714 ® Hc as the environ- 
ment, which we again denote by 7g. We can then define the map «g transforming 
the initial state in the input system to the final state in the environment as 


n° (p) & Trp Ux(p ® po)U*. (5.7) 


As shown in the following theorem, the final state in the environment of a Stinespring 
representation is unitarily equivalent to that of another Stinespring representation as 
long as the initial state of the environment po is chosen as a pure state. That is, the 
state x” (p) essentially does not depend on the Stinespring representation. 


Fig. 5.1. Quantum H, Ho H 


communication channel 
Input |. communication Output 
system medium system 
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Theorem 5.2 Given two Stinespring representations (Hc, po, U,,) and (He, po, Ul.) 
with the condition rank po = rank py = 1, there exists an partial isometry V from 
Ha ® Hc toH, ® He such that 


Trg Ul(p @ py)UL* = V Trg U,(p ® po)U*V*. (5.8) 


Proof Consider the reference system 7{r and the maximally entangled state |®) 
between H, and He. Hence, U,,(|®)(P| @ po)U* and U!(|)(H| @ pp)UL™ are 
purifications of the same state «(|®)(®|). Then, due to Lemma 8.1, there exists a par- 
tial isometry V from H4 ® Hc to H, ® He such that VU,.(|®)(®| ® po)UZV* = 
U.(|&)(D| ® po)UL". Since the output state on the composite systems with the input 
state pare givenasd Trp U,,(|®)(®| @ po)U*p" andd Trr U/ (|G) (P| @ py)UL*p’, 
respectively, due to (5.6), we have (5.8). | 


As a characterization for a special TP-CP map, we have the following theorem. 


Theorem 5.3 The following conditions are equivalent for a linear map k from 
T(Ha) toT (Hp). 


@® The system Hx can be regarded as a subspace of Hg in the following sense. There 
exists another system Hr such thatH, = Hp ® He and k(p) = Tre U,,pU; by 
choosing a unitary matrix U,, in Hs ® He. 

@ K*(X)K*(Y) = K*(XY) for any two matrices X,Y € M(Hap). 


Due to Theorem 5.1, maps « satisfying © forms a special class of TP-CP maps. 
This theorem guarantees that the dual «* of such a TP-CP map is a homomorphism 
for matrix algebras. This property is helpful for the latter discussion. 


Proof Assume Condition @. Then, «*(X) = X @ Ig for X € M(Hz). Hence, 
KN(X)K*(Y) = (X @ Ie)\(Y ® Iz) = (XY @ Ig) = &* (XY) for any two matrices 
X, Ye M(Hpz). 

Next, we assume Condition @. First, we choose a CONS {|u;)}; of 71g. Since 
K* (\u;) (uj; |)&* ((u;) (uj |) = &*({u;) (uj; |), &*(lu;)(u;|) is projection. We choose the 
basis {|vg.1}, of the image of «*(|u1)(ui|). Next, we define the basis |v,.;) := 
K*(\ui)(wi|)|Ue1). Since «*(|u;) (ui |)R* (ui) (ui |)* = K*(\ui) (ui |), the set {|vx i) be 
forms a basis of the image of &*(|u;)(u;|). AS >); &* (Jui) (uil) = K* (>); |ui) (ui) = 
K*() = 1, the set {|vg.;)}x,; forms a basis of #4. Now, we define another sys- 
tem Hc spanned by {|w,)}, and a unitary U : |u;) ® |wz) > |v,;). Then, we have 
&*(\uj)(uj|) = U (lui) (uj| ® Ic)U*. Since M (Hg) is spanned by matrices |u;)(u ;|, 
we have «*(X) = U(X ® Ic) U* for X € M(Hsz), which implies ©. | 


In the next section, we will use this theorem to obtain a concrete example of a TP-CP 
map. In fact, the partial trace and the map p +> p ® po are TP-CP maps, as is easily 
verified from Theorem 5.1 above. Another representation is the output state (5.5) of 
the channel « for the maximally entangled state ©; between the input system and the 
same-dimensional reference system. This representation has not only mathematical 
meaning but also a theoretical importance because it is possible to identify the channel 
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« by identifying this final state (5.5) [6, 7]. Further, using this notation, we can 
describe the output state of any input pure state entangled with the reference system 
as follows as well as the output state &() as (5.6). From the discussion in (1.22) 
[GC1], any pure entangled state |X) can be described as (I4 ® /dX")|®,). Hence, 
we have 


(K ® tr) (|X)(X|) = (k @ er)(Ua ® VdX")|@y)(®a| (4 @ VdX)) 
=d(Ip @ X")(K @ tr)(|®a) (Pal) Te @ X). (5.9) 


In Condition ©, another representation {F;} of the completely positive map is 
given and is called the Choi—Kraus representation. From (1.22) the state (Kk ® 
Lr)(|Pg)(Pq|) has the form 


(& ® tr) (Pa) (Pal) = > [Fi)( (5.10) 


Hence, when {Fy} is another Choi—Kraus representation of k, F; is represented as 
a linear sum of {F;}. As is shown in Appendix 5.7, the TP-CP map «” can be 
characterized by a Choi—Kraus representation as follows. 


Lemma 5.1 When {F; ia , is a Choi—Kraus representation of , the environment 
system is described by C4 and the matrix elements of «" (p) are given as 


" (p)i,; = Tr F7 Fip. (5.11) 


Using Choi—Kraus representation we can characterize extremal points of TP-CP 
maps from 7H, to 71, as follows. 


Lemma 5.2 (Choi [3]) A TP-CP map k is an extremal point of TP-CP maps from 
Ha to Hep if and only if « has Choi-Kraus representation { F;} such that { F¥* Fj}, ; 
is a linearly independent set. 


Proof Suppose that « is an extremal point. Let {F;} be a Choi—Kraus represen- 
tation of « such that F; is linearly independent (See (a) of Exercise 5.5). Sup- 
pose >); jij) Fj = 0 and the matrix norm ||(j,;)|| 1s less than 1. Define «+ as 


+( p=, FipF? £ 2ij Mi FoF. Since I + (A;,;) => 0, K4 isa TP-CP map. It 


also follows that k = Shy + SK. Since « is extremal, Kh, = «. That is, A;,; = 0. 
Therefore, {F* Fj}j, ; S a jinteanly independent set. 

Conversely, suppose that { F;* F;};,; is a linearly independent set. We choose TP- 
CP maps «, and «2 and a real number 0 < X < | such that & = AK, + (1 — A)ko. 
Let {FY} be a Choi—Kraus representation of K,. Then, « has Choi—Kraus represen- 
tation {/AF}} U {./1 — AF?}. Thus, F; is written as > rij Fj (See (b) of Exer- 
cise 5.5). From the condition >’; (F} YF = I Di F; F; and the assumption[GC2], 
>; i,j’ Xj, = 5;,;- Hence, we obtain & = K (See Tee 5.4). 


a 
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Corollary 5.2. When a TP-CP map k from H 4 to He is extremal, i.e., a Choi-Kraus 
representation satisfies Condition given in Lemma 5.2, the Choi-Kraus representa- 
tion has only da elements and any image is included in the d’,-dimensional space 
of Hg at most. Further, when da, < dg, we can construct Stinespring representation 
with dc = 1. 


A Stinespring representation guarantees that the state evolution corresponding to 
the TP-CP map « can be implemented by the following procedure. The initial state po 
is first prepared on Hg ® 7c; then, the unitary evolution U,, is performed on 714 ® 
Hep ® Hc. It is commonly believed that in principle, state evolutions corresponding 
to an arbitrary unitary matrix U, can be implemented, and hence state evolutions 
corresponding to arbitrary TP-CP maps can also in principle be implemented. 

Let us now consider the case where we are given two TP-CP maps « and «’ that 
map from the quantum systems 714, 1’, to Hg, H',, respectively. The state evolution 
of the composite system from H4 ® H’, to Hg @ H', is given by K @ «’. One may 
wonder whether the map & ® &’ also satisfies the condition for TP-CP maps. Indeed, 
this condition is guaranteed by the following corollary. 


Corollary 5.3 Given a linear map x’ from T(H',) to T(H',), the following two 
conditions are equivalent. 


® k’ isa TP-CP map. 
@ &®@k’ isa TP-CP map when x is a TP-CP map from H, to He. 


As another condition for positive maps, we focus on the tensor product positivity. 
A positive map x is called tensor product positive if «®” is positive for any integer 
n. It follows from the above corollary that any CP map is tensor product positive. 


Proof of Corollary 5.3 The proof will be shown based on Condition @ of Theorem 5.1. 
Condition © is equivalent to the condition that K (« @ x’) is positive semidefinite, 
which is equal to K(k) ® K(k’). Since K («) is positive semidefinite, then K (k’) is 
positive semidefinite. This is then equivalent to Condition ©. a 


The fact that the dimension of the space of p is d’” is important in connection with 
quantum computation. One of the main issues in quantum computation theory is the 
classification of problems based on their computational complexity. One particularly 
important class is the class of problems that are solvable in polynomial time with 
respect to the input size. This class is called the polynomial class. The classification 
depends on whether operations are restricted to unitary time evolutions that use 
unitary gates such as C-NOT gates or if TP-CP maps are allowed. However, as 
confirmed by Theorem 5.1, TP-CP maps can be simulated by d (d’”)-dimensional 
unitary evolutions. Therefore, it has been shown that the class of problems that can 
be solved in polynomial time is still the same [8].! 


'More precisely, we can implement only a finite number of unitary matrices in a finite amount of 
time. For a rigorous proof, we must approximate the respective TP-CP maps by a finite number of 
unitary matrices and evaluate the level of these approximations. 
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Remark 5.1 The discussion presented here can be extended to a more general physi- 
cal system, i.e., the case where the states are given as the duals of the general operator 
algebra, e.g., C*-algebra, von Neumann algebra, and CCR algebra. 

In the area of operator algebra, the dynamics is given as the map « satisfying 
the condition of Theorem 5.3. Although such discussions in the area of operator 
algebra contain the infinite-dimensional case, they do not cover the case when the 
input system is strictly smaller than the system 71g @ 71z. In such a case, the state 
of the system 71, ® Hg is given as U,.(p ® po)U;*, which is not invertible. Hence, 
if an analysis for the dynamics « under the condition of Theorem 5.3 covers only 
invertible state, it cannot be extended to the case with general dynamics «. This point 
is a blind spot when an analysis by operator algebra is employed. 


Exercises 


5.1 Show that « is trace-preserving if and only if «* is identity-preserving by using 
(5.2) with Y = [. 


ok 2k 2k ok 
5.2 Show (5.3) by using (" Goes. ) > 0. 


K*(X) I 
5.3 Show Corollary 5.1 using Theorem 5.1. 


5.4 Let {F;} be a Choi—Kraus representation of the TP-CP map « and u;,; be a 


: : def : : : ; 
unitary matrix. Show that F/ = >: ; 4i,; Fj 18 also its Choi—Kraus representation. 


5.5 Show the following items for a TP-CP map « from S(H,) to S(H3). 

(a) Show that there exists a Choi—Kraus representation {F;} of the TP-CP map 
such that matrices F; are linearly independent. (Hint: Use Exercise 5.4.) 

(b) Choose two Choi—Kraus representation {Fi}t, and {F pee of the TP-CP map 
«. Show that F F can be written as a linear combination of F;. (Hint: Use a method 
similar to the proof of Theorem 5.2.) 


5.2 Examples of Trace-Preserving Completely Positive 
Maps 


In addition to the above-mentioned partial trace, the following examples of TP-CP 
maps exist. 


Example 5.1 (Unitary evolution) Let U bea unitary matrix on 11. The state evolution 
Ky: Pb Ky(p) e U pU* from S(H) to itself is a TP-CP map. This can be easily 
verified from Condition © in Theorem 5.1. Since an arbitrary unitary matrix U has 
the form U = Ve'®, where ec”? is a complex number with modulus 1 and V is a 
unitary matrix with determinant 1, we can write UpU* = VpV*. Therefore, we can 
restrict V to unitary matrices with determinant 1. Such matrices are called special 
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unitary matrices. However, there are no such unitary state evolutions when the 
dimension of 7/4 is smaller than the dimension of 7/,. In such a case, we consider 
isometric matrices (i.e., matrices satisfying U*U = 1) from H, to Hg. The TP-CP 


map Ky (p) =U pU* is then called an isometric state evolution. 


Example 5.2 (Partial trace) The partial trace p +> Tr p can be regarded as a state 
evolution from quantum system H @ 1’ to quantum system HW’. It is also a completely 
positive map because it is a special case of Condition @ of Theorem 5.1. 


Example 5.3 (Depolarizing channel) For arbitrary 1 > \ > 0, a map 
def 
Ka,\(p) = Ap + ( — A)(Tr p) pmix (5.12) 


is a d-dimensional TP-CP map and is called a depolarizing channel. In particular, 
when d = 2, we have™*>® 


3 
3A 4+ 1 1-2 
= —— S; pS; . 5.13 
K2,,(0) = —7—p + | 2d pS; (5.13) 
A depolarizing channel Ky, satisfies Kg,,(UpU*) = Ukg,,(p)U* for all unitary 
matrices U. Conversely, when a d-dimensional channel satisfies this property, it 
is a depolarizing channel. 


Example 5.4. (Entanglement-breaking channel) A'TP-CP map from 7/4 to 7, sat- 
isfying the following conditions is called an entanglement-breaking channel. For an 
arbitrary reference system 7c and an arbitrary state p € S(H4 ®@ Hc), the output 
state (K ® tc)(p) on the space Hg ® Hc is separable. 


The entanglement-breaking channel is characterized as follows. 


Theorem 5.4 (Horodecki et al. [9]) The following two conditions are equivalent for 
a TP-CP map & from Ha to He. 


@® & is an entanglement-breaking channel. 
@ «can be written as 


K(p) = Kuw(p) = > (Tr pM.) Wa, 
we 


where M = {M..}weq is an arbitrary POVM on H, and W is a map from 2 to 
S(Hzp,). 


For a proof, see Exercise 7.5. 

Ifthe W,, maps are mutually orthogonal pure states, then Ky, w (p) can be identified 
with the probability distribution Pe . A POVM M is not only a map that gives the 
probability distribution from the quantum state p, but it can also be regarded as 
an entanglement-breaking channel (and therefore a TP-CP map). 
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Example 5.5. (Unital channel) ATP-CP map « from H.4 to Hz satisfying « (p4,,) = 
p8., is called a unital channel. The depolarizing channel defined previously is a unital 
channel. 


Example 5.6 (Pinching) Recall that the pinching Ky: pt> >) .<9 M.pM.,, is 
defined with respect to the PVM M = {M_,}..cq in Sect. 1.2. This satisfies the con- 
ditions for a TP-CP map only when M is a PVM. For a general POVM M, the map 
PY Duce VMupV™M., is a TP-CP map. 


If all the elements M_, are one-dimensional, the pinching ky is an entanglement- 
breaking channel. If the POVM has a non-one-dimensional element M.,, it is not an 
entanglement-breaking channel. 


Example 5.7 (Transpose) For a quantum system (4, we define the transpose oper- 
ator 7 with respect to its orthonormal basis uo, ..., uq—1 as 


p= >> pijlui)(ujl > 70—) Sp? =D pjilui)(ul. (5.14) 
i,j 


ij 


Then, 7 is a positive map, but not a two-positive map. Therefore, it is not a completely 
positive map. However, it is a tensor-product-positive map™**"’. 


According to Exercise 1.3, any tensor product state p“ @ p® satisfies (tT, ® 
ip)(p* @ p®) = Ta(p*) ®@ p® = 0. Hence, any separable state p € S(H, @ Hz) 
also satisfies (rT ® tg)(p) => 0. The converse is the subject of the following theo- 
rem. 


Theorem 5.5 (Horodecki [10]) Assign orthogonal bases for Ha and Hg. Let T 
be the transpose with respect to Ha under these coordinates. If either Hs or Hg 
is two-dimensional and the other is three-dimensional or less, then the condition 
(T @ tg)(p) = O is the necessary and sufficient condition for the density matrix p on 
the composite system Hs ® Hz to be separable. 


Counterexamples are available for C? @ C* and C* @ C? [11]. If the input and 
output systems of « are quantum two-level systems, we have the following corollary. 


Corollary 5.4 Let 7 be a transpose under some set of coordinates. If & is a channel 
for a quantum two-level system (i.e., a TP-CP map), the following two conditions 
are equivalent. 


1. ToKisaCP map. 
2. «is an entanglement-breaking channel. 


Example 5.8 (Generalized Pauli channel) Define unitary matrices Xq and Zy using 
the same basis as that in Example 5.7 for the quantum system 7/4 as follows: 


Xaluj) = |uj-1 moa a), Zaluj) = w!|uj), (5.15) 
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where w is the dth root of 1, i.e., e~?/“. The generalized Pauli channel «GP is given 
by 


d—1d-1 
def ae iniyx ini 
KP (p) = >) > pli, JKZ))* 0(Xi,Z)) (5.16) 
i=0 j=0 


for the probability distribution p in {0,...,d— 1}*?. We often denote X, and Zz 
by X, and Z, respectively for indicating the space 1, that these act on. The above 
channel is also unital. For a quantum two-level system, we can write this channel as 


3 
K(p) = > piSipS}, (5.17) 


where p is a probability distribution in {0, 1, 2, 3}, and the Pauli matrices S; were 
defined in Sect. 1.3. This is called a Pauli channel and will be denoted by KP. 


Example 5.9 (Transpose depolarizing channel (Werner—Holevo channel, Antisym- 


metric channel)) If and only if a real number X belongs to [— zh: al: the map 


KF (p) & dp” + 1 — Np mix (5.18) 


is a TP-CP map, and it is called a transpose depolarizing channel, where d is 
the dimension of the a [12] (see baie 8.84 and Theorem 5.1). In par- 
ticular, when A = —7—, the channel KI ae is called an antisymmetric channel 


[13] or a Werner-Holevo channel [14]. "This channel satisfies the anticovariance 
Ky (U pU* j= UK} (p)U". 


Example 5.10 (Phase-damping channel) Let D = (d;,;) be a positive semidefinite 
matrix satisfying d;,; = 1. The following channel is called a phase-damping channel: 


def 
Ky (p = Dat jPi, j Ui) (uj, (5.19) 


where p = 7; ; pi,j|ui)(ujl- 


For example, any pinching &y with a PVM M is a phase-damping channel. Since 
pp, 1 R R 1 R R 
KD (GD ltees wear, ui) = 5D decslee, wf!) (ua, 7! 
kl kl 


Condition @ of Theorem 5.1 guarantees that any phase-damping channel xj) is a 
TP-CP map. 
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Lemma 5.3 A phase-damping channel od is a generalized Pauli channel rg sat- 
isfying that the support of p belongs to the set {(0,0),..., (O,d — 1)} if and only 
if 


dj; = dj9 fori- j= i’mod d. (5.20) 
See Exercise 5.14. 


Example 5.11 (PNS channel) Define the n-fold symmetric space 1; , of C4 as 
the space spanned by {v®"|v € C“} c (C“)®”. Let the input system be the n-fold 
symmetric space 7’ , and the output system be the m- aod symmetric space 1H", 
(n > m). The PNS (photon number splitting) channel x" ae is given by 


pns 


def 
Kan—+m(P) = Tr(cayen—m p, (5.21) 


where we regard p as a state on the n-fold tensor product space. In this case, the 
support of the output state is contained by the m-fold symmetric space 1({",. Hence, 
we can check that it is a TP-CP map from the n-fold symmetric space 71" , to the m- 
fold symmetric space #{{',. Indeed, this channel corresponds to the photon number 
splitting attack in the quantum key distribution. 


Example 5.12 (Erasure channel) Let the input system be C4 with the basis wo, ..., 
Uqg—, and the input system be C4 with the basis uo, ..., Ug_1, Uq- The erasure channel 
kq.p With the probability is given as 


Kau 


= = (1 = p)p + plua)(ual. (5.22) 


Exercises 


5.6 Show formula (5.13) for the depolarizing channel on the quantum two-level 
system. 


5.7 Prove Theorem 4.5 from Condition © of Theorem 5.1. 


5.8 Show that (XZ) (XK?Z%) = witha (X?Z)(X"'Z") for symbols defined 
d “4d d “d d “4d d “4d y 
as in Example 5.8. 


5.9 Show that 


ave 


d—1 d—- 
> > (XjZi)"X (X4Zi) = (Tr X) prix (5.23) 
j=0 k=0 


for an arbitrary matrix X by following the steps below. 
(a) Show that 
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d—1 d-1 

(<7 Fide 2 KiB) X KZ) (x2 Z5) 
j=0 k=0 

4 d—1 d-1 


p => Diez" X(X{Zi) 


j=0 k=0 


for arbitrary j’, k’. 

(b) Show that the matrix A = bare al*\u;) (ux| is diagonal if ZagA = AZg. 
(c) Show that all of the diagonal elements of A are the same if XgA = AXq. 
(d) Show (5.23) using the above. 


5.10 Show that 
peat . 
a SX (X4,Z4, ® Ip)*p(X4,Z4, ® Ip) = phi, @ Tra p 
j=0 k=0 
for a state p on 74 ® He using formula (1.28). 
5.11 Let H4, Hz be the spaces spanned by uj\,..., u4_, andu@,...,u?_,. Define 


d 
ie Jali out uA B S (X47! @ Inyuad, 


1] 


and show that these vectors form a CONS of 74 ® Hg. 


5.12 Suppose that the classical-quantum channel W, is given by a depolarizing 


channel kz,, as W, = Ka,\(p). In this case, the set of states S(H,) of the input system 
is regarded as the set of input alphabets 1. Show that the depolarizing channel ka, 
is pseudoclassical and its capacity is given by 


Co(Ka,n) = Co(ea,r) 


eee iceti =i + 62 ae =) testi = 90, 


5.13 Show that the transpose 7 is tensor product positive. 


5.14 Show Lemma 5.3 by following the steps below. 

(a) Show that the generalized Pauli channel ay satisfies (5.19) and (5.20) when the 

support of p belongs to the set {(0,0),..., (0,d — 1)}. 

(b) Assume oe a ae damping channel oo satisfies (5.20). Define p(0, m) = 
4 Tr DX” = + a: 9 dj,ow 4”. Show that CP = eg 

(c) Show Lemma 5.3. 
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5.15 Show that (Kj, oS s 


5.16 Show that («P"_)% = «P™ 


d,n—>m Ka, n>n—-m'* 


5.3 State Evolutions in Quantum Two-Level Systems 


As mentioned in Sect. 1.3, the states in a quantum two-level system may be parame- 
terized by a three-dimensional vector x: 


3 
1 : i 
=5 (s. ae a, s) (5.24) 


Let « be an arbitrary TP-CP map from a quantum two-level system to another quan- 
tum two-level system. We shall now investigate how this map « can be characterized 
under the parameterization (5.24). As discussed in Sect.4.1, this kind of map is 
characterized by a linear map from the set of Hermitian matrices on C? to itself. 
Consider a state evolution of the unitary type given in Example 5.1. The special 
unitary matrix V may then be diagonalized by a unitary matrix. The two eigenvalues 
are complex numbers with an absolute value | and their product yields 1. Therefore, 
we represent the two eigenvalues by e!’ and e~’ and the two eigenvectors by u; 
and u2. We write V = e'?|w1)(ui| +e? |u2)(u2| = exp(i(O|u1)(ui| — O|u>) (u2!)). 
The unitary matrix V may therefore be written as exp(iX), where X is a Hermitian 
matrix with trace 0. We will use this description V = exp(iX) to examine the state 
evolution when a special unitary matrix V acts on both sides of the density matrix 
of a quantum two-level system. 

Let us examine some algebraic properties of the Pauli matrices. Define €; ;,; to be 
0 if any of 7, k, and / are the same, €).2,3 = €3,1.2 = €2,3.1 = 1, and 6321 = €1,32 = 
€2,1,3 = —1. Then, [$/, S*] = —21 57, €),.45'. This is equivalent to 


B 3 3.3 3 
> x54, > eS* | = 21 DD xe ina. 
j=l k=1 


j=l k=1 1=1 


Defining Rj S Le jicliks Sx bd pe _,x;S/, and Ry = ye , x; R/, we may rewrite 


the above expression as 
i 
55x Sy = Sr,y- 


As shown later, this equation implies that 
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exp Ee Sy exp | — ‘s. = Sexp(ry)y- (5.25) 
2 2 
Applying this equation to states, we obtain 
i i 
exp 55 Py &Xp —aSe = Pexp(Rx)y- (5.26) 


This shows that a 2 x 2 unitary matrix exp(§ Sx) of determinant | corresponds to a 
3 x 3 real orthogonal matrix exp(R,). 


Proof of (5.25) Since Sy is Hermitian, the matrix i 5S, can be diagonalized by a unitary 
matrix with purely imaginary eigenvalues. Therefore, exp(i 5 S,) is a unitary matrix. 
Note that exp(i 5 S,)* = exp(—i5 Sy). Since exp(i 5 S,)Sy exp(—i 5 Sy) is a Hermitian 
matrix with trace 0 like Sy, it can be rewritten as Sys) according to Exercise 1.14. 
Let us write down the vector y(s). Differentiating exp(i 7 Sx) Sy exp(—i 7 Sx) with 
respect to s, we obtain 


Sys) = (exp (i5 =S. x) Sy exp (- i5Sx)) 
— — (i5s.)) Sy exp (-i5 5.) + exp (i55:) Sy (exp (-i5s.)) 
=e exp (i5 =x) Sy exp (-i i= Ss ) + exp (i; ~ Sy )s exp (-i i5S:) (-55:) 


/ Se, exp (i =x) Sy exp (- 158.) |= [556 Sse | = Snow 


and we find that y(t) satisfies the differential equation 
y'(s) = Ry y(s). (5.27) 
It can be verified that 
y(s) = exp(sRy)y (5.28) 


satisfies this differential equation. The uniqueness of the solution of an ordinary 
differential equation guarantees that only the function y(s) given in (5.28) satisfies 
y(0) = y and (5.27). Applying this to when s = 1, we obtain (5.25). | 


Next, we consider a general TP-CP map. Define k (Kis = 5 Tr S;K(S;). The trace- 
preserving property guarantees that 


a 0,0 1 a Oi 1 
Kw) = 5 Tra) = 1, K(n)™ = 5 Trv(Si) =0 
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fori 4 0. Now define t according toi & SR («yh °° Then, we have k(pmix) = pr. Let 
T be the 3 x 3 matrix [K(«)* h<i,j<3- The TP-CP map « may then be denoted by 
a vector ¢ and a matrix T as [4] 


K(Px) = prxtt- (5.29) 


For example, when a unitary matrix operates on either side, then t = 0 and T is 
an orthogonal matrix given by (5.26). To give a few more examples, let us rewrite 
the examples given in the previous section using ¢ and T for the quantum two-level 
system. When « is a depolarizing channel, we have t = 0, and therefore T = XJ. 
The necessary and sufficient condition for the channel to be unital is then ¢ = 0. 
When & is the transpose, we have 


100 
t=0, T={0-10 
001 


Next, we consider the necessary and sufficient conditions for a map to be a positive 
map, a completely positive map, an entanglement-breaking channel, and a Pauli 
channel, respectively, by assuming that the channel is unital, i.e., ¢ = 0. Recall from 
the discussion in Sect. A.2 that special orthogonal matrices O;, O2 may be chosen 


such that T’ = O;T Op is diagonal (i.e., a singular decomposition). Taking unitary 
matrices v 1, Uz corresponding to O;, O2 based on the correspondence (5.26), and 
letting «7 be the TP-CP map cone peudiis to T, we have ky, 0k? oky, = ve 
For the analysis of the TP-CP map x’, it is sufficient to analyze the TP-CP map «? 
Now, using the eigenvalues ;, 2, atid A3 of T’, we give the necessary and sulncieit 
conditions for the above types of channels as follows. 


Positivity: Let us first consider a necessary and sufficient condition for a positive 
map. It is positive if and only if the image by T’ of the unit sphere {x|||x|| < 1} is 
contained by the unit sphere. Thus, its necessary and sufficient condition is 


|Ai|, [Az|, [A3| < 1. (5.30) 


Completely positivity: Next, we use Condition @ of Theorem 5.1 to examine the 
completely positive map. In order to check this condition, we calculate K («7 ): 


1+ A3 0 0 Ai + 2 
0 1—A3 A;—A2~—OO 
0 A,y—A2 1-A3 0 
Ay +2 0 0 1+ A3 


| 
K(k’) == 
ie = 


Swapping the second and fourth coordinates, we have 
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1+A3 Ay +2 0 0 

A +A2 L+A3 0 0 
0 0 1—A3 Ay — Ao 
0 0 AN-A 1-A3 


Thus, the necessary and sufficient condition for K (K7’) > Ois [4, 15] 
(1+ As)? = Ort)’, (= As)? = Ar = 2)’. (5.31) 
This condition can be rewritten as 
1> Ay + A2— A3, A1 — An $A3, —Ar + A2 +43 > -1 (5.32) 


from Condition (5.30). 
Entanglement-breaking: Due to Corollary 5.4, the channel «” is entanglement- 
100 
breaking if and only if the eigenvalues of T | 0 —1 0 ] satisfy (5.32). Since these 
001 
eigenvalues are A), —A2, A3, the following is a necessary and sufficient condition 
for a channel to be an entanglement-breaking channel [16]: 


1 > |Ai| + Aa + |A3I- (5.33) 


Pauli channel: Next, we treat the necessary and sufficient condition for a channel 
to be a Pauli channel. When the channel is a Pauli channel, firstly, we have t = 0. 
When the state evolution is given by the unitaries S, S2, and $3, the matrix T is 
given, respectively, as 


10 0 —-10 -100 
0-10 }, 010 }, 0 —-10 
00 -l 0 0-1 0 01 


Using p; in (5.17), the matrix T is given by 


Po+ pi — p2— p3 0 0 
0 Po— Pit p2— Ps 0 
0 0 Po — Pi — P2 + P3 
A, 0 0 
={ 0X0). (5.34) 
0 0 3 


That is, the real numbers A,, A2, and A3 are characterized as Fig. 5.2. 


Finally, the following theorem holds regarding the pseudoclassical property of 
channels examined in Sect. 4.7. 
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Fig. 5.2 Pauli channel 


Theorem 5.6 (Fujiwara and Nagaoka [17]) Let H be two-dimensional, X be given 
by S(C*), and W be given by the trace-preserving positive map k from C? to C?. A 
necessary and sufficient condition for a channel to be pseudoclassical is that one of 
the conditions given below should be satisfied. 
I. t=0. 
2. Let t be an eigenvector of TT*. Let r be one of its eigenvalues and ro be the 
larger of the other two eigenvalues. Then, 


(|lt\| —r) (1 (sues) h (=*)) 
w (HE) | 


ro Sr? = [ltr + 


Exercises 


5.17 Check Condition (5.32) in the Pauli channel case (5.34). 


5.18 Show that the Pauli channel given by (5.34) is entanglement-breaking if and 


only if pj < 5 fori = 1, 2,3. 


a 


5.19 Show that the positive map Inv, : (: ‘) re X (4. =) +(1-A) © ») 


is completely positive if and only if z >A> 0. 


5.20 Show that 


F (px, pr = 


Sue /i= we 
1+/1-|x| se IP + ey) (5.35) 
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5.4 Information-Processing Inequalities in Quantum 
Systems 


In this section, we will show that the quantum versions of the information quanti- 
ties introduced in Sect.3.1 satisfy the information-processing inequalities (i.e., the 
monotonicity) under the state evolutions given previously. 


Theorem 5.7 (Lindblad [18], Uhlmann [19]) Let & be a TP-CP map from H, to 
Hg. Then, the monotonicity of the quantum relative entropy 


D(p\lo) = D(K(p)I|K()) (5.36) 
holds. 
This theorem may be used to show many properties of the quantum relative entropy 
and the von Neumann entropy. For example, let p),..., o, anda), ..., 0% be density 
matrices on 1 and let p; be a probability distribution in {1,..., k}. Consider now 


the density matrix 


Re . oe os (5.37) 
O PkPk O PROK 


on H @ C*. Since the partial trace Trex is a TP-CP map, the inequality 


k 
o( Ero 
i=l 


holds [20]. This inequality is called the joint convexity of the quantum relative 
entropy. 


k k 
a) < D(RIIS) = >) piD(villoi) (5.38) 
i=1 i=1 


Proof of Theorem 5.7 Examine the connection with hypothesis testing. Let k be a 
TP-CP map from 7/4 to 7/g. If a Hermitian matrix T on fe satisfies ] > T > 0, 
then («®”")*(T) must also satisfy J > («®”)*(T) > 0. Therefore, from Condition @ 
of Theorem 5.1 and Corollary 5.3, we deduce that («®”)*(7) > 0. On the other hand, 
we see that J > («®”")*(T) from 


PRP YT) = OY) =O) = VO - Tye 0. 
Since a state p € S(H,) satisfies 
Tr(K(p))°"T = Tr p®"(K®")*(T), 


the test («®")*(T) with the hypotheses p®” and c®” has the same accuracy as the 
test T with the hypotheses «(p)®” and «(c)®". That is, any test with the hypotheses 
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&(p)®" and «(c)®" can be simulated by a test with the hypotheses p®” and o®” with 
the same performance. We therefore have 


B(pllo) = B(K(p)||K()). 


Note that B(p||o) is defined in Theorem 3.3. Hence, applying Theorem 3.3 then 
completes the proof. a 


Indeed, this proof requires only the tensor product positivity. Hence, since the trans- 
pose 7 is tensor product positive, inequality (5.36) holds when « is the transpose T. 
Uhlmann [19] showed this inequality only with the two-positivity. This argument 
will be shown with a more general form in Theorem 6.12 in Sect. 6.7.1. Further, the 
equality condition of (5.36) can be characterized as follows. 


Theorem 5.8 For a TP-CP map k, we assume that D(p||o) < 00. Then, the ranges 
of 0 and K(@) contain those of p and «(p), respectively. Then, the following conditions 
are equivalent. 


® The equality of (5.36) holds for a state p. 
@ The relation following relation holds. 


p= VJon(/a(o) K(p)VK(o) Va. (5.39) 


@ The relation P,K*(K(a)~‘K(p)')) = a‘ p! holds for any t > 0. 
Here, we use the generalized inverse. 


Theorem 5.8 will be shown with a more general form as Corollary 6.1 in Sect. 6.7. 
Now, using Theorem 5.8, we show Theorem 3.6. 


Proof of Theorem 3.6 First, we show @=>@. Since Dey ||P“) = D(Km(p) 
|Ku(o)) and K3,(p’) = p’, Condition © of Theorem 5.8 with t = | implies that 


Tr Mip 
—1 * -1 i 

a  p=Ky(KM(O) Pray (oyku(p)) = > Mj. (5.40) 

i:Tr Mjo>0 Tr Mio 

Taking the adjoint, we have 
Tr Mip 
=| i 
= M;. 5.41 
po . 2. Tr Mio ( ) 
i:Tr Mjo>0 


So, we have (3.120). Equation (3.120) implies the commutativity, i.e., [0, 7] = 0. 
Thus, we obtain @. 

Next, we show @=>@. Equation (3.120) and the commutativity imply 
that D(p||o) = Tr plog(>”, a; Mi) = Tr p>); log a;M;) = >”; log a; Tr pM;, which 
equals DEF lisesee a 
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Now, we consider the lower bounds of D(p||c) as 


De p(pll) = ymax D(P,"IIPS'), De(plia) = max DP IIPs'). (5-42) 


When D(p||a) < 00, since the function M bt DRY (Pe ) is continuous and the set 
of PVMs is compact, the maximum for PVM exists. Also, Lemma A.11 guarantees 
the existence of the maximum for POVM. 

In general, these quantities do not satisfy the additivity (3.106). By applying (2.29) 
to distributions ((u;|p|u;)) and ((u;|o|u;)), the quantity D,,,(p||o) can be written 
as [21] 


k k 
De,p(pllo) = ken ie Trp >) Aj |ui) (ui | — log Tro >) e* |i) (uj | 


i=1 i=1 


=max Tr pX — log Tr oe’, (5.43) 
where {u;} is a CONS and X is a Hermitian matrix. 
Theorem 5.9 


D.(plla) = Dep (pla) = max Tr pX — log Tr oe. (5.44) 


Proof Choose the optimal POVM M such that D.(p||7) = D(P)" ||P"). For the 
POVM M, we take the Naimark extension (Hz, 09, E) given in Theorem 4.5. Then, 
we have 


D.(plla) = De,p(p ® pollo ® po) = max Tr p ® poX’ — log Tra @ poe . (5.45) 


Now, we choose the matrix X’ attaining the maximum (5.45). Since logx is 
matrix concave, Corollary A.1 guarantees that X := log Trg (I ® po)e* > Trg(I ® 
po) loge* = Trg(I @ po) X’. Then, we have Tr pX > Tr(p @ po) X’. Since Tr ve* 
= TroTra(I @ po)e* = Tr(a ® po)e* , we have De, »(pl|o) > De(pllc). | 


Substituting log 0-2? po~? into X in (5.44), we obtain [21] 


Nis 


D.(p\|o) = Tr plog a7? po~ : (5.46) 
Similarly, substituting 2 log on? (2 po?)2072 into X, we obtain 
D-Apllo) > 2Tr ploga~2(c2po2)ia72. (5.47) 


Now, using (5.46), we show the Golden-Thompson trace inequality [21]. 
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Lemma 5.4 (Golden-Thompson trace inequality [22—24]) Any two Hermitian 
matrices A and B satisfy 


Tree = Tre. (5.48) 


Proof Itis sufficient to show the case when Tr e? = 1. Wechoose p = e4*8/ Tr e4*8 
and o = e®. Then, (5.43) implies that 


log Tr e®e4 = Tr pA — Tr pA + log Trae“ > Tr pA — De. p(pllo) 
> Tr pA — D(p||c) = Tr p(A — log p + log a) 
=Trp(A —-(A+B- log Tr e4**) + B) =logTre**?. 


As will be shown in Corollary 8.4 of Sect. 8.2, the Bures distance b(p, ) also satisfies 
the monotonicity [25-27] 


b(p, 7) = b(K(p), K(o)) (5.49) 


with respect to an arbitrary TP-CP map «. This inequality may be derived from 
Corollary 8.4 given later. From (5.49) we may also show its joint convexity 


k k k 
bY (x Pibir > nn) < B(R, 8) = >> pid? (pi. 01) (5.50) 
i=l i=l al 


in a similar way to (5.38). The variational distance dj(p, 0c) also satisfies the 
monotonicity 


d\(p, 7) = d\(K(p), K(o)) (5.51) 


for an arbitrary TP-CP map « [28] ®**”. Furthermore, as extensions of (3.19) and 
(3.20), the monotonicities 


P(s|pllo) < (s|K(p)||K(o)) forO< 5s < 1 (5.52) 
Pslpllo) = O(s|K(p)||K(o)) for -1 <5 <0 (5.53) 
_ - 1 

Pslpllo) < O(s|K(p)||K(o)) forO<s < 5 (5.54) 
&s|pllo) = O(s|K(p)||K(o)) for s < 0 (5.55) 


hold. The relations (5.52) and (5.53) will be proved in Appendix A.4 by using matrix 
convex or concave functions. For a proof of Relation (5.55), see Exercise 5.21. We 
omit the proof of (5.54), which is given in [29]. Notice that Inequality (5.53) does 
not hold in general with the parameter s € (—oo, —1), as shown in Exercise A.16. 
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The inequalities (5.52), (5.53), (5.54), and (5.55) are rewritten as the monotonicity 
of the quantum relative Rényi entropies 


D,-s(pllo) = Di-s(K(p)||K(@)) for ~-1<s <1 (5.56) 


1 
D,_,(pllo) = Dy_,(K(p)||K(@)) for s < 5" (5.57) 
As the limit s — —oo, we have 


Dmax(p||7) 2 Dmax(K(p) ||K(7)). (5.58) 
Exercises 
5.21 Show (5.55) by using Exercise 3.58. 


5.22 Show that the equation ¢(s|p||o) = &(s|pllo) does not hold for s < —1 in 
general by following steps below. _ 

(a) Derive inequality (5.53) for s < —1 by assuming ¢(s|p||o) = (s|pl|o) for s < 
-1. 

(b) Show the above argument by using Exercise A.16. 


5.23 Show the monotonicity of transmission information 
I(p, W) = I(p, K(W)) (5.59) 


for any TP-CP map « and any c-q channel: W = (W,), where K(W) = (K(W,)). 


5.24 Let W, «, and o be ac-q channel, a TP-CP map, and a quantum state, respec- 
tively. Define the c-q channel K(W) : x b> &(W,). Show the following inequalities 
for s € [—1, 1] \ {0} by using (4.74). 


J(p, Ka), K(W)) < J(p,o, W) (5.60) 
Ji+s(p, KC), K(W)) < Si4s(p, 0, W) (5.61) 
I(p, K(W)) < I(p, W) (5.62) 

Ti+s(p, &(0), KW)) < Ni4s(p, 0, W) (5.63) 
Ti, (p, K(a), K(W)) < Li, (p, o,W) (5.64) 
Ce(K(W)) < CW) (5.65) 
Ch,.(K(W)) < Cy,,(W). (5.66) 


5.25 Extend the definition of D)+;(p||o) and D,, ,(p||o) by i log Tr p!+%a~* (3.9) 


and i log Tr(a 25 po” 205 I+ (3.13) to the case when o satisfies only the condi- 
tion o > 0 although p satisfies the conditions p > 0 and Tr p = 1. In this definition, 
the case with s = 0 is given with the limit s + 0. Show the following items under 
this extension. 
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(a) Any TP-CP map x satisfies 
Dizs(K(—)IIK(7)) S Dizs(plla) fors € [—1, 1] (5.67) 
Di s(K(OIIK(O)) S Diy; (lla) fors € -5; 00). (5.68) 
(b) When a projection P satisfies PoP = p, 
Diss(p||PoP) < Diss(ello) for s €[—1, 1] (5.69) 


1 
D,,,(pl|PoP) < Di, ,(plla) forse [=> 00). (5.70) 


In particular, the equality holds when o = PoP + UI — P)oU — P). 
(c) Any constant c satisfies that 


Diss (plleo) = Diss (pllo) — loge (5.71) 
Dy, ,(plleo) = Dy, (olla) — loge. (5.72) 


(d) Any isometry U satisfies 


Di+s(UpU"||UoU") = Diss(pllo) (5.73) 
D,,,(UpU"||UcU") = D,,, (pla). (5.74) 
(e) When o < o’, 
Diys(pllo) = Diss(pllo’) (5.75) 
D,,,(pllo) = Dy,,(pllo’). (5.76) 


5.5 Entropy Inequalities in Quantum Systems 


In this section, we will derive various inequalities related to the von Neumann entropy 
from the properties of the quantum relative entropy. 

Substituting o = Pmix into the joint convexity of the quantum relative entropy 
(5.38), we obtain the concavity of the von Neumann entropy as follows: 


k k 
a(Som)e: dei H (pi). (5.77) 
i=l 


Further, as shown in Sect. 8.4, when a state ge on 74 ® Ws is separable, the von 
Neumann entropy satisfies 


H(p*:®) > H(p*), H(p®) (5.78) 
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for the reduced densities p4 and p? of p4-8. We apply the inequality (5.78) to the 
separable state R defined in (5.37). Since the von Neumann entropy of R is equal to 
ae DiAl (p;) + H(p), we obtain the reverse inequality of (5.77): 


k k k 
u(y 7 <> pA (pi) + Hp) < >) iH (pi) + logk. (5.79) 
i=] 


al i=l 


In particular, if the supports for the densities p; are disjoint, the first inequality satisfies 
the equality. 

Similar types of inequalities may also be obtained by examining the pinching ky 
of the PVM M. The quantum relative entropy satisfies 


A (km(p)) — H(p) = D(pllKku(p)) 2 0. (5.80) 


Since the inequality 


D(pllkm(p)) < log |M| (5.81) 


holds [30] ®*°”*, we obtain 
H(p) < H(km(p)) < H(p) + log|M|. (5.82) 


Let p“, p®, p48, and p**© be the reduced density matrices of the density matrix 
p=p* 8° on Ha @ He @ Hc. From the monotonicity of the quantum relative 
entropy, we obtain 


D(peP Ip" ® p*) = Dip" Ip" ® p*). 


Rewriting this inequality, we may derive the following theorem called the strong 
subadditivity of the von Neumann entropy. 


Theorem 5.10 (Lieb and Ruskai [31, 32]) The inequality 
Hp?) + H(p*) < H(p**) + Hp") (5.83) 


holds. 
Further, the equality condition of (5.83) is given as follows. 


Theorem 5.11 (Hayden et al. [33]) The equality in (5.83) holds if and only if there 
is a decomposition of the system Ha as 


Ha= BD Ha-s.j @ Ha-c,j (5.84) 
j 


into a direct (orthogonal) sum of tensor products such that 
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PAP = Daipi? ® pi (5.85) 
d 


with states p® on Hg ® Ha—p,j and p4© on He ® Ha-c,;, and probability distri- 
bution qj. 


In particular, when 7/4 is one-dimensional, 
H(p”"©) < H(p*) + H(p%), (5.86) 


which is called the subadditivity. Let us change the notation slightly and write 
H(p*:®) as H,(A, B) in order to emphasize the quantum system rather than the 
quantum state. The Strong subadditivity is then written as 


H,(A, B, C) + H,(A) < H,(A, B) + H,(A, C). (5.87) 


Now, using this notation, let us define the conditional entropy H,(A|B) = 
H,(A, B)—H,(B) using this notation. This quantity satisfies the following 
concavity: 


k 
H,(A|B) > > pi H,,(A|B), (5.88) 


i=1 


where p = >”; pifi- 
Similarly to Sect. 2.1.1, we can define the quantum mutual information /,(A : B) 
and the quantum conditional mutual information /,(A : B|C) as 


1,(A: B) 2H,(A) + H,(B) — H,(AB) (5.89) 


1,(A:: B|C) 2H, (AC) + H,(BC) — H,(ABC) — H,(C). (5.90) 


The positivity of quantum mutual information is equivalent to the subadditivity, and 
that of quantum conditional mutual information is equivalent to the strong subaddi- 
tivity. 

Theorem 2.10 shows that the entropy H(p) satisfies the asymptotic continuity 
in the classical case. The same property holds even in the quantum case. To see the 
asymptotic continuity in the quantum case more precisely, we introduce the Fannes 
inequality, which is particularly useful. 


Theorem 5.12 (Fannes [34]) Define 


def | (x) O <x < I/e 


No(x) = We deer, (5.91) 
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where 7(x) a log x. Then, for two states pando on H (dim H = d), the inequal- 
ity 


|H(p) — H(a)| < €logd + no(©) (5.92) 


holds for ¢ © ||p — alli. 
Let us consider the following lemma before proving this theorem. 


Lemma 5.5. Write the eigenvalues of the Hermitian matrices A and B in decreasing 

order (largest first) including any degeneracies, i.é., a,,..., Aga, b,..., bg. Then, 
d 

|A — Bll = d§_, lai — Bil. 


Proof Let P= {A—B > 0},X © P(A— B),andY © —(1 — P)(A — B). Then, 


X>0,¥>0,andA—-B=X-Y.LettC2A+¥ =B+X. Then,C> A,B. 
Now let c; be the eigenvalues of C arranged in decreasing order. From Exercise 
A.12 we know that c; > a;, b;. Therefore, if a; — b; > 0, then 2c; — a; — b; — (a; — 
b;) = 2(c; — a;) = 0, and we obtain 2c; — a; — b; > |a; — b;|. This also holds for 
a; — b; < 0, and therefore 


> lai — bil < >) cj -a; —b;) =Tr(2C— A— B) =Tr(X+Y) =Tr |A — Bl. 


t L 


Proof of Theorem 5.12 We only provide a proof for || — o||1 < 1/e. See Exercise 


5.35 for the case when ||p — o||; > 1/e. Let a;, b; be the eigenvalues of p, o placed 


: , def ‘ 
in decreasing order. Define €; = |a; — b;|. Then, according to Lemma 5.5 and the 


assumptions of the theorem, e; < 1/e < 1/2. From Exercise 5.34 we obtain 


d d 
|H(p) — Ho) < 3 Ina’) — nb) < Dn). 


i=1 i=1 


Next, define ¢ = 74, ¢;. We find that ©“, ne) = ¢ ©, n (“) + n(€). Since 


- 
7 () represents the entropy of the probability distribution (4, 2’,..., “), 
we see that this must be less than logd. Exercise 5.34 (b) guarantees that 79 is 
monotone increasing. Thus, the inequality € = || — o||,; = €’ implies that 7(¢’) < 


No (Ile — oll1). Hence, 


d 
> nlc) < clogd + m6) (5.93) 


i=] 


Therefore, we obtain the inequality (5.92). a 
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Finally, we address what axioms identify the von Neumann entropy H(p). It is not 
difficult to generalize Axioms K1-K3 and A1-A2 and A4 to the quantum case. K4 
can be regarded as the unitary invariance in the quantum case. However, it is not 
so easy to generalize Axioms K5 and A3 to the quantum case. Replacing K5 by 
Subadditivity, we consider the following set of axioms. 


Q1 (Normalization) 
S(Pmix,ck) = logk. (5.94) 


Q2 (Continuity) S is continuous on S(H). 
Q3 (Nonnegativity) S is nonnegative. 
Q4 (Invariance) For any unitary U, we have 


S(p) = S(U pu"). (5.95) 
Q5 (Additivity) 
S(p ® 0) = S(p) + S(o) (5.96) 
Q6 (Subadditivity) 
S(pas) < S(pa) + S(pz). (5.97) 


It is known that, when a quantity S satisfies all of the above axioms, it becomes 
the von Neumann entropy H(p) [35]. 


Exercises 
5.26 Show (5.83) using the monotonicity of the relative entropy. 
5.27 Show (4.3) from the concavity of von Neumann entropy. 


5.28 Show (5.81) following the steps below. 

(a) Show (5.81) for a pure state. 

(b) Show (5.81) for the general case using the joint convexity of the quantum relative 
entropy. 


5.29 Show (5.88) using (5.87). 


5.30 Show the Araki—Lieb inequality [36, 37] below using the subadditivity and 
the state purification introduced in Sect. 8.1 


H(p*-®) > |H(p*) — H(p*)I. (5.98) 


5.31 Show that the strong subadditivity (5.83) is equivalent to the following 
inequality: 
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H,(AB|C) < H,(A|C) + A, (BIC). (5.99) 
5.32 Show the following inequality using the strong subadditivity (5.87): 
Auyui(A, C) + Ay (A, D) = Ajay (A) + Ayu wi (CB). (5.100) 
5.33 Using (5.98), show that 


|H,(A|B)| < log dy. (5.101) 
5.34 Show that 
In(x) — n(y)| <= n(x — yl) (5.102) 


if x and y satisfy |x — y| < 1/2 following the steps below. 

(a) Show that n(x + €) — n(x) < n(e) for x > Oande > 0. 

(b) Show that 7(x) is strictly concave and has its maximum value when x = I1/e. 
(c) Show that n(a — €) — n(a) < nU — ©) — C1) fore <a <1. 

(d) Show that the function 7(x) — 7(1 — x) is strictly concave and n(x) — nd — 
x) > Ofor0 < x < 1/2. 

(e) Show that n(x) — n(x + €) < (©) using (c) and (d), and hence show (5.102). 


5.35 Prove Theorem 5.12 for di(p, a) > 1/e following the steps below with the 
notations given in the proof for d)(p, 7) < 1/e. 

(a) Show (5.92) if €; < 1/e, i-e., all the e; are less than I/e. 

(b) Show that 


|H(p) — H(o)| < 1/e + €'log(d — 1) + mole’), 


where e’ © +7, and if e, > Le. 
(c) Show that «logd > é' log(d — 1) + 1/e if «; > 1/e. Hence, show (5.92) in this 
case. 


5.36 Show that I(p, W) < dlogd+7(6) using Theorem 5.12, where 62 
> P(x)|| Ws — Wolli- 


5.37 Let p and p be two arbitrary states. For any real 0 < € < 1, show that 


|H,(A|B) — H,(A|B)| < 2clogd, + h(o), (5.103) 


following the steps, where + = d-—é)pt+ep [38]. 

(a) Using (5.88) and (5.101), show that H,(A|B) — H,(A|B) < €(H,(A|B) — 
H;(A|B)) < 2€ log dg. 

(b) Show that H,(B) > (1 — 6), (B) + €H;(B). 

(c) Show that H,(AB) < (1 — €)H,(AB) + €H;(AB) + h(e). 

(d Using (5.101), show that H,(A|B) — H,(A|B) > €(H,(A|B) — H;(A|B)) — 
h(e) => —2elogd, — h(e). 
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5.38 Show that 


|H,(A|B) — H,(A|B)| < 4elog d, + 2h(©) (5.104) 


for states p and o on H, ® 7, and € pd lle — o|l1 following the steps below [38]. 


(a) Define the states pe Ip—ol, se -£(p—o)+ ‘Ip—ol, and ye d— 


€)p + ep. Show that y = (1 — Aa + «0. 
(b) Using (5.103), show that |H,(A|B) — H,(A|B)| < 4€logd, + 2h(€). 


5.39 Using the above inequality, show that 


[[,(A: B)—I,(A: B)| < 5elogd, + ole) + 2h(e) (5.105) 


for states p and o on H, ® 7, and € = lle-—olh- 


5.40 Show that 


[I,(A : BIC) — 1,(A : B|C)| < 8 log dadg + 6h(€) (5.106) 


for states p and o on H, ® 71g and € “ lle — oll, following the steps below [39]. 
(a) Show that |/,(A : B|C) — I,(A: B|C)| < |, (AIC) — H,(A|C)| + |H,(BIC) 
— H,(B\C)| + |H,(AB|C) — H,(AB|C)|. 

(b) Show (5.106) using (5.104). 


5.41 Show the chain rules of quantum mutual information and quantum conditional 
mutual information: 


H,(AB|C) = H,(B|C) + H,(A|BC), (5.107) 
1,(A: BC) =1,(A:C)+1,(A: BIC), (5.108) 
1,(A: BC|D) = 1,(A: C|D) +1,(A: B|CD). (5.109) 


5.42 Show that the monotonicity J,(A : B) > I).4@2,9(A : B) for local TP-CP maps 
KA and KB. 


5.43 Show that the monotonicity /,(A : B|C) > I,@c,9(A : B|C) for local TP-CP 
maps «4 and Kz. 


5.44 Using (5.82), show the Hiai—Petz theorem [30] for two arbitrary states p, 0 


1 
lim = D(Ko2n(p®")|]o®") = D(plio), 
non 
where &e represents the pinching of the measurement corresponding to the spectral 
decomposition of c®”. Hence, the equality in (3.18) holds in an asymptotic sense 
when the POVM is the simultaneous spectral decomposition of Ken (p®") and ®". 
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Combining this result with the classical Stein’s lemma gives an alternate proof of 
Lemma 3.6. 


5.45 Show Holevo’s inequality /(M, p, W) < I(p, W). 


5.46 For a classical-quantum channel W = (W;) and a TP-CP map &, we define 
K(W) = (K(W;)). Show the inequality 7(p, K(W)) < I(p, W). 


5.47 Given densities p# and p? on H,4 and Hz, show the strong concavity of von 
Neumann entropy: 


H(>) pip? @ p?) = HCD) pip) + > pi A?) (5.110) 


from the joint convexity of quantum relative entropy (5.50) for states p* @ p? and 
Prix ® p} (40). 


5.48 Show that 
Au ,e@ny(p(AlB) = H)(A|B) (5.111) 


for any TP-CP map k on 7/3. 


5.6 Conditional Rényi Entropy and Duality 


Finally, we consider the quantum version of the conditional extension of Rényi 
entropy. For generalization of the conditional entropy, we have four kinds of condi- 
tional Rényi entropies as 

1 I+s —s 
Hy4sip(AIB) = —Dis(pll/a ® ps) = —— log Tr p'*(14@ pp"), 6.112) 


Ay4s\p(AlB) := —Dy,,(pllIs ® pa) 


== Stop {((t4.@ 95" )o( ta 8 9” a) a (5.113) 


H}),ip(A|B) := max —Di4s(pll1a ® 08), (5.114) 


Ht 4(AlB) i= =a D,,,(pllZs @ 7B). (5.115) 


Due to the relations (5.56) and (5.57), any TP-CP map « on 7, satisfies 
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Fissip(AlB) < Atssin(y(AlB) for s > —1, (5.116) 
Ai ysip(A|B) < Aissin(p)(A|B) for s > 5 (5.117) 
Hf, (ALB) < Fl) nc) (ALB) for s > —1, (5.118) 

Fg (ALB) S Fl) tac) (AlB) for s > -5. (5.119) 


Due to the properties of Di45(p||14 ® pp) and Dy, .(plll4 @ pp), Ai+s\p(AlB) 
and Ai ss)p(AlB) are monotone decreasing for s and lim,_,9 Hj+5))(A|B) = 
lim,_.9 H1+5,)(A|B) = H,(A|B). In the case of s = 0, they are defined as H,(A|B) 
because=: 2.29 

(A|B) 


: _ i t 
lim Ai +s\p(A|B) = lim i+ 519 


= lim Ai4s\p(A|B) = lim Hy, (A|B) = H(AIB). (5.120) 


+s |p 


From the definition, we find the relation 


Fy 45\)(A|B) S Hi, (A[B), Ai 4sip(A|B) < < H}, ,,(AIB). (5.121) 
The relation (3.25) implies that 
Hissip(AIB) < Aissp(AlB), Hfis,(AIB) < A, (AlB). (5.122) 


According to the sega (2.40), Amin |p(AlB), Ht. 
(A|B), Amax|p(A|B), 7 


as Amin p(A|B), i, 
(A|B) are defined as 


in |p’ 
(A|B) Amax|p(A|B), and 


fae Ane 
def ,. def 
Hnin o( ALB) = lim Hiysip(AIB),  Hphin p(AIB) = lim Hy,,),(AIB), (5.123) 
~~ def ,. ~ 
Amin |p(A| B) = jim FA +5\p(A|B), Fit. ip(AlB) = = = lim n Aly {p(AIB), (5.124) 
def |. def 
Amax (A B) = Jim, Ay 45;)(A|B), Hh x \p(AlB) = = im Aly sp(AIB), 
(5.125) 
=~ def ,. ~ def 
Almax|p(A|B) = lim, Ai+sip(A|B). A. p(AlB) = Jim Aly sip(AIB)- 
(5.126) 


Unfortunately, these four conditional Rényi entropies are not the same in general. 
Thanks to the properties of the relative Rényi entropies D(p||o) and D(p||c) given 
in Lemma 3.1, we have the following lemma™***'. 
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Lemma 5.6 The functions s +> sH\+5\)(A|B), 5 Ay 4s\p(A|B), sH}\,.),(A|B), and 
sH}, ,,(A|B) are concave for s € (—1,00). The functions s +> Hj+5))(A|B), 


Ai 4sip(AIB), H! (A|B), and H! (A|B) are monotonicallly decreasing. 


l+s|p 1+s|p 


Lemma 5.7 The quantity H} (A|B) has the following form. 


l+s|p 


* log Tre (Tra p't)t, (5.127) 


Hg (AlBy = —Di4s(pllZa @ oG*”) = — 


1 
(I+s) def (Try p!+5) Ts 
where o, = 
Trg (Tra p's) Ts 


Proof Substituting Tr, p'** iy o, into X and Y in the matrix reverse matrix Hélder 
inequality (A.28) with p = ~; 4. and g = —t , we obtain 


eo Pits (oll1@o8) = Trp Tra oa, 


Ls Ls tS Ss AY Ss AY ae Ss 
>(Tra(Tra p! +S yd a (Trp ae 1/ )S = (Tra (Tra pit ) ts i+ 


for s € (0, co]. Since the equality holds when og = oe, we obtain 


t 1 
e SAissioAlB) — (Tra (Tra p't*)r)!*5, 


which implies (5.127) with s € (0, oo]. 
The same substitution to the matrix Hélder inequality (A.26) yields 


1 
eo Pits (ell1@c8) < (Trp(Tra pits)tayits 


for s € (—1, 0). Since the equality holds when og = a os , we obtain (5.127) with 
s € (-1,0). 
Using (3.10), (3.11), and (3.16), we obtain the following lemma. 


Lemma 5.8 When p has the form >, PALANAN) Plas the quantities 


Amin|p(A|B), Hi RANE): Amax |p(A|B), and A. (A|B) are characterized as 


a lp 


=t all 

Amin |p(A| B) = — log max Pa(a)|lPg° PB\A=aPg Il. (5.128) 

me ol ere 

Al. (A B) == log min max Paa)\log* PB\A=aF B : Il, (5.129) 
OB a 

Hymax|p(AIB) = —log >) Tr{pajasa > O}pa, (5.130) 
a: P,(a)>0 

Hi. ,(A|B) = — log min >» Tr{ppja-a > O}on, (5.131) 
OB 


a: P,4(a)>0 
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where op isa one on Hep. 


Further, the quantity H. um |p(A| B) has the following operational meaning with respect 
to state discrimination. 


Lemma 5.9 (K6nig et al. [41, Theorem 1]) When p=) , Pala)(a| ® pa, i.e, 
Pa(a) = Pa and ppiaza = Pa we have 


Peuess = C- Fin (ALB (5.132) 


_! a! 
In this scenario, when we apply the POVM {M,} with Mz := pp* PaPaPz’. the 
correctly recovering probability is [41] 


1 seul = 
>. pa Tr pg? Pababy Pa = e041), (5.133) 


a 


which gives a lower bound of Pguess. Hence, we have 


F,(A|B) > Ayn ))(AlB)- (5.134) 
Proof Choosing og := and x := Tr F,, we have 
(RHS of (3.84)) = min TrF 


F>0:14@F=pap 
= min min x 
op>O:Tr og=1 x:I4@xop=pas 
a j at i 
= min ||[d4@og) 2 pasa @eg) ? || 
opg>0:Trog=1 


1 1 
: = ab 
= min max P,4(a)||o,° Paja=aFp  |I ae Finn p(ALB) 
OB a 


where (a) and (b) follow from Exercise 3.13 and (5.129), respectively. Hence, 3.84 
yields (5.132). a 


Now, we give duality relations among four kinds of Rényi entropies. Consider 
tripartite system 7H4@ 7g ® Hc. When the state p of the composite system 
Ha ® He ® Hc is a pure state |W) (¢|, we can show that 


H,(A|B) + H,(A|C) = 0, (5.135) 


which is a duality relation with respect to the conditional entropy. As a generalization 
of the duality relation, we have the following theorem. 


Theorem 5.13 [42-45] When the state p of the composite system Ha ® Hp ® He 
is a pure state |wW) (|, the following holds. 
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Hoyp(A|B) + Ha,(A|C)=0 for a, 8 € [0,2], a+ 8 =2, (5.136) 
1 1 1 

A,(AlB)+H},(AIC)=0 for a, ah at ger 6.137 

Hj), (AIB) + Hs(AIC)=0 for a, Be [0,0], a-B=1. (5.138) 


Proof Firstly, we can show (5.136) as follows: 


— SA 45;)(A|B) = log Tr Pap Ua ® pz’) = log Tr paspyg (la ® PZ’) 
5 —s (a) 
=log(W|(e4, @ Ic)Ua,c ® pg IY) = log(v|UaB ® PC) (Pac ® In )I¥) 
=log Tr pa,.cUa ® Po)Pa. c= = log Tr py, cUa ® Po) = = SH\_s)(A|C), 
where (a) follows from Exercise 1.36. 


Next, we show (5.138). Due to the expression of 
it is sufficient to show that 


(A|B) given in Lemma 5.7, 


Hl 


“log Tr {(Tratpin))” | 


=o wete | (1.0 02 )onc(e 0c") | (5.139) 


because 3 = 4, To prove (5.139), we show that the operators 


Tralote) and (14 ® pc ) pac(Ia @ 0c ) (5.140) 


are unitarily equivalent, which is a stronger argument than (5.139). To see that this 
is indeed true, note the first operator in (5.140) can be rewritten as 


onl a=l 
Tralee) = Tra [pag Pan Pars | 
axl act 
= Trac { (ean ® te) pane pat ® 2) 
2 Trac { (tas ® pe © ) pase (a ® pa oh : (5.141) 


where (a) follows from Exercise 1.36 because p is a pure state. Since 


(a ® be © )panc (ae ® be a) (5.142) 


is arank-1 matrix on the bipartite system B and AC, the RHS of (5.141) is unitarily 
equivalent with 
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Trg (ae ® pa ©) pasc (as ® pa 7 )= (a ® pa ©) pac (1s ® pa Zo), 


This concludes the proof of (5.138). 
Finally, we show (5.137). For a < 1, we have 


(a) 


H,(AIB) 2 —"—logmax|Ipip(Ia ® 75 Palla 
22 nem tlyls of lat 
© logmax min(|(I4 @ 05) @ Te“ |v) 
2“ logminmax(|(4 @ 03 |) ® 76. “IU, (5.143) 


where (a), (b), (c), and (d) follow from Exercise 3.12, Exercise A.13, Exercise 1.37, 
and Lemma A.9, respectively. Similarly, for @ > 1 with the relation 4 + 4 = 2, we 
have . 


= | Jay 4 
Hj,,(AIC) = —glogmaxmin(\(la ® oc) @ 7 “Iv) 
a . 1_y j—1 
ar ere )®@ae “ly). (5.144) 
a oc TB 
The combination of (5.143) and (5.144) yields (5.137). | 


Considering inequalities opposite to (5.121) and (5.122), we obtain the following 
corollary, in which, the second inequality in (5.145) and the first inequality in (5.146) 
can be regarded as generalizations of Lemma 2.4. 


Corollary 5.5 [45] Let pag € S(AB). Then, the following inequalities hold for 


a €[4, oo]: 
H)\,(A|B) < H or (A|B),  Aip(A|B) < Ay_1),(A|B), (5.145) 
Hj\(A|B) < Hy_1),(AIB), Hj|,(A|B) < Hy_1,,(A|B). (5.146) 


The preceding inequality (5.134) can be regarded as a special case of (5.146) with 
a=. 


Proof Consider a purification p of p4g with the reference system Hc. The relations 
(5.145) follow from the combination of the relations (5.121) with the system 74 and 
Hc and the duality relations (5.136) and (5.138). The relations (5.146) follow from 
the combination of the relations (5.122) with the system 74 and Hc and the duality 
relations (5.137) and (5.138). a 
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Exercise 


5.49 Show the relations 

Hojp(A|B) = Hoip(AlBi Bz), H)),(A|B1) = H4),(A|Bi Bo) (5.147) 
for a € [0, 2], and 

Ho\p(A|By) = Hoyp(A|BiB2), H/\,(A|Bi) = H)\,(A| BiB) (5.148) 


fora@eé [},00). 


5.7 Proof and Construction of Stinespring and Choi-Kraus 
Representations 


In this section, we will prove Theorem 5.1 and construct the Stinespring and Choi- 
Kraus representations. First, let us consider the following theorem for completely 
positive maps, without the trace-preserving condition. 


Theorem 5.14 Given a linear map k from the set of Hermitian matrices on the 
d-dimensional system H, to that on the d'-dimensional system H_, the following 
conditions are equivalent. 


k is a completely positive map. 

k* is a completely positive map. 

« is a min{d, d'}-positive map. 

The matrix K(k) on Ha ® He is positive semidefinite. 

(Stinespring representation) There exist a Hilbert space Hc with the same 
dimension as Hp, a pure state pp € SCHg ® Hc), and a matrix W in Hs ® 
Hp ® Hc such that K(X) = Tra.c W(X ® po) W*. 

(Choi-Kraus representation) There exist dd’ linear maps F\,..., Fag from 
Ha to Hp such that K(X) = >); F;X F?. 


We also have Conditions 6! and © by deforming Conditions © and © as follows. 
These conditions are also equivalent to the above conditions. 


SROROR STS) 


iS) 


@! There exist Hilbert spaces Hc and Ho, a positive semidefinite state py € S(Hc), 
and a linear map W from HaHc to HaHe such that K(X) = Tro W(X ® 
po) W*. 

© There exist linear maps F\,..., Fx from, toHe such that k(X) = >); FX F?. 


Proof Since @< © has been shown in Sect.5.1, we now show that O>@>@=> 
©>6>@' > Oand@>G6! >©. Since OS, OSG’, and ©>@’ by inspection, 
it suffices to prove the remaining relations. 
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First, we will derive @=>@ as follows. In the following, we will show @ only 
when d' < d due to the following reason. The equivalence between @ and © shows 
that Condition ® implies that «* is a d-positive map for d < d'. This fact derives 
Condition @ for k* when @=>@ holds for d’ < d. Hence, we have K (K*) > 0, which 
is equivalent to K(«) > 0. Thus, we obtain @ for « ford < d’. from @>@d' < d. 

Since k« is a d-positive map, & © ug is a positive map (vg is the identity map in 
T (Hzp)). Let X be a positive semidefinite Hermitian matrix on H, ® 7g. Assume 
that X = Die j1, xEDGD eA @ ef )(e4 @ eP |. Since (K @ vg)(X) = 0, we have 


0 < (Igl(k @ tz)(X)|Ip) 
= >) x64) Tp] (K(f) (e# 1) ® le?) (e? |) Ia) 


i,j,k l 
= >) x64) (eB (let) (e# DleP) (5.149) 
i,j,k l 
= > xO) K (gy EDEb = Tr XK(k). 
i, j,kl 


Therefore, K (kK) > 0, and we obtain @. In the above, we denote the vector oe ep ® 
a in the space Hg @ Hz, by Iz. In the derivation of (5.149), we used the fact that 


(Tal (leg) (e7 | ® ley’) (er |) a) = (Tal (leg ® ef )(e? @ e7'l) Ua) = dx,s51,0- 


We now derive @=>©. Since K(k) > 0, ./K («) exists. In what follows, we con- 
sider a space 7c with a basis a, eee Cis Note that the space 7{c¢ is isometric to the 


, def d' 
space 7g. Defining Uc.g = >“y_) er @ eB, we have 


Tr |e/)(e%+| ® (|Uc.8) (Uc,al) les @ eX“ @ e? (ef @ e~; Be? | 
=4 1,791 191,19k,s» (5.150) 


where the order of the tensor productisH4 ® Hc ® Hz. Although K (x) is originally 
a Hermitian matrix on H4 ® 78, we often regard it as a Hermitian matrix on H4 ® 
Hc because Hc is isometric to 71g. Using (5.150), we have 


Tre(X)¥ = Tr(X @ Y)K(k) = Tr (X @ [Uc.2)(Uc.al) K(n) @ Y 
=Tr (X @ |Uc,s)(Uc.al) (VK) ® Ip) (I4,c ® Y) (VK@) @ Is) 
=Trg Tra.c ((v K(k) ® Ip) (X ® |Uc,8)(Uc,a}) (v K(k) ® In)) ¥ 


for VX € T(Ha), VY € T (Hz). Therefore, we can show that 
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K(X)=Tr4.c | (VER Wer) (xe Seales) (ViKWet) | 


(5.151) 


Letting po = Wes Vel and W = ./d'K(k) @ Ip, we obtain ©. 

Next, we show that O=>@. Let po be |x) (x|, P be a projection from H, ® Hg, ® 
Hc toH, ® |x), and P;;, be aprojection from H, ® Hz ® Hc toHg ® les @ e&;). 
Using formula (1.29) of the partial trace, we have 


d d 
K(X) = Trac W(X ® po) W* = D7 >) Pic W PX PW" Pia 
i=1 k=1 
“i 


d 
i=l k 


We thus obtain ©. 
Finally, we show GY =@. From Condition @ any positive semidefinite Hermitian 
matrix X on H, ® C” satisfies 


(PikWP)X(PikWP)*. 
1 


K@ in(X) = Trryerce(W ® In)(X ® py)(W* @ In) = 0, 


where /, is an identity matrix in C”. Therefore, « is an n-positive map for arbitrary 
n. It follows that « is a completely positive map from which we obtain ©. 
Concerning a proof of ©’ >@, we have 


K® tn(X) = > \(F; @ In) X(FF ® In) = 0, 


for a semipositive definite Hermitian matrix X on 714 ®@ C”. Thus, we obtain @. 


Next, we prove Theorem 5.1. Thanks to Theorem 5.14, it is sufficient to show the 
equivalence of Conditions © to ©’ in Theorem 5.1 when « is a completely posi- 
tive map. Indeed, O>@, ©>G6’, and ©>@’ by inspection. Concerning ©! >@® 
and © =>, it is sufficient to show the trace-preserving property because of Theo- 
rem 5.14. Therefore, we only show @>@>©=>@ as follows. 

We first show @=>@. From definition (1.26) of the partial trace we obtain 


Tra p = Trg K(p) = Tra.p(p @ Ip) K (kK) = Tra p (Trg K (k)) 


for arbitrary p € S(H,). Hence, Trg K (K) = J,, and thus we obtain @. 

Next, we show @=>©. Employing the notation used in the proof of @>© in 
Theorem 5.14, we let P be the projection from H,4 ® Hg ® Hc to H, ® |Uc.z). 
Since any p € S(H,) satisfies 
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Tree Try Trae ((Va'K () @ In) PpP (Va'K() ® In) 
=Trac.s ((VdK() @ In) PpP (Ja K() @ In). 


we obtain 
ees ((VaKw @ Ip) P) (VaKw @ Ip) P=P. 


Let Hp be the range of (./d’K (k) ® Ig) P for Ha ® |Uc,g). Then, the dimension 
of Hp is equal to that of H4. The matrix (./d’K (kK) ® Ig) P can be regarded as a 
map from H, ® |Uc,g) to Hr. 

Let Hrt be the orthogonal complementary space of Hr inH4 ® Hg ® Hc, and 
H," be the orthogonal complementary space of H4 ® |Uc.g). Since the dimension 


of Hr~ is equal to that of H4*, there exists a unitary (i.e., metric-preserving) linear 


mapping U’ from Het to Ha. Then, U,, = (/d’K(k) ® Ip) P ® U’ is a unitary 


linear map from H4 ®@ Hg ® Hc = (Ha ® |Uc.8)) ® Hat toH, @ He ® Hc = 
Hr ® Hr. Therefore, from (5.151) we have K(p) = Trac Uxp ® Wesel ry, 
which gives Condition ©. 

Next, we show @=© by employing the notation used in the proof of ©>© in 
Theorem 5.14. Since 


d d 
Tr p=Tr K(p)=Trp Trac Us(p ® po)US= >) >) Trp PinWPpPW*P; x 
i=1 k=1 
d d d d 
= DID Tre (PicW P) (Pic WP)* = Tra D) > (Pic WP) (PieW P)p, 
i=1 k=1 isl k=1 


we obtain x yo (P;,W P)*(Pi:,WP) = I,. Therefore, we obtain ©. Further, 
from the proof ©=>©, we obtain Lemma 5.1. 

Finally, we directly construct Stinespring representation ©’ from Choi—Kraus 
representation ©’. Define the map W from H, to Hg @ C* as 


def 


k 
W(x) = D1 FiO) Be. 
i=l 


Then, W satisfies 


k 
Troe WpW* = >) FipFy. 


i=1 


We obtain Condition 6’ from ©’ in Theorem 5.14. In Theorem 5.1, we have to 
check the unitarity. From the condition > F* F; = I, we obtain W*W = 7, i.e., 
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W is an isometry map. Hence, it is possible to deform map W to a unitary map by 
extending the input space. In this case, the state in the environment «*(p) equals 
Trg WeW* = (Tr FF p)i,j- Thus, we obtain Lemma 5.1. 


5.8 Historical Note 


5.8.1 Completely Positive Map and Quantum Relative 
Entropy 


A completely positive map was initially introduced in the mathematical context; 
Stinespring [1] gave its representation theorem in the observable form, 1.e., K*(A) = 
PU;(A ® I)U,,P, where P is the projection from the extended space to the original 
space. Holevo [46] proposed that any state evolution in the quantum system could be 
described by a completely positive map based on the same reason considered in this 
text. After this, Lindblad [18] translated Stinespring’s representation theorem to the 
state form. Then, he clarified that any state evolution by a completely positive map 
could be regarded as the interaction between the target system and the environment 
system. 

Concerning other parts of Theorem 5.1, Jamiotkowski [5] showed the one-to-one 
correspondence between a CP map « and a positive matrix K (4), firstly. After this 
study, Choi [3] obtained this correspondence. He also obtained the characterization 
© concerning CP maps. Kraus [2] also obtained this characterization. Choi [3] also 
characterize the extremal points as Lemma 5.2. 

In this book, we proved the monotonicity of the quantum relative entropy based 
on the quantum Stein’s Lemma. Using this property, we derived many inequalities in 
Sects. 5.4 and 5.5. However, historically, these were proved by completely different 
approaches. First, Lieb and Ruskai [31, 32] proved the strong subadditivity of the 
von Neumann entropy (5.83) based on Lieb’s convex trace functions [47]. Using 
these functions, they derived the monotonicity of the quantum relative entropy only 
concerning the partial trace. During that period, Lindblad [20] proved the joint con- 
vexity of the quantum relative entropy (5.38). After this result, using the Stinespring’s 
representation theorem in the state form, Lindblad [18] proved the monotonicity of 
the quantum relative entropy (5.36) from that concerning the partial trace. Later, 
Uhlmann [19] invented the interpolation theory, and proved the monotonicity of the 
quantum relative entropy based on this approach. As an extension of the quantum 
relative entropy, Petz [48] generalized this kind of monotonicity to quantum f -relative 
entropy for a matrix convex function f, as explained in Sect. 6.7.1. A more detailed 
history will be discussed in the end of the next chapter. 
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5.8.2. Quantum Relative Rényi Entropy 


Now, as variants of quantum relative entropy, we discuss quantum relative Rényi 
entropy. The above mentioned Petz’s [48] approach contains the case with f(x) = 
x!*5 Hence, applying his method to the function x!**, we can derive the monotonic- 
ity (5.56) of the relative Rényi entropy D,,;(p||o). In this book, we prove the 
monotonicities of the relative Rényi entropy regarding measurements (3.19) and 
(3.20) using only elementary knowledge. Moreover, the monotonicity (3.20) holds 
with a larger parameter s < 0 as compare with the monotonicity (5.53). 

Recently, another kind of relative Rényi entropy D,,,(p||o) was proposed by 
Wilde et al. [49] and Miiller-Lennert et al. [44], independently. They showed the 
monotonicity (5.57) of D,,,(pllo) for 2> 1+ s > 1 by using Lieb’s concavity 
theorem [47]. Then, the monotonicity (5.57) of D,,,(p||o) was shown by Frank 
et al. [29] for 1+s > 4 and by Beigi [43] for 1+ 5 > 1, independently. Frank 
et al. [29] showed the case with 1 > 1+s > 5 by using Ando’s convexity the- 
orem [50] and the case with 1+ s5 > 1 by using Lieb’s concavity theorem [47]. 
Beigi [43] showed the case with | + s > 1 by using Hélder inequalities and Riesz- 
Thorin theorem [51]. This book shows the case with 1 + s > | by using the equation 
$(—s|pllo) = lim, soo + maxy o(—s|P%, ||P), which is a simpler proof. 

Indeed, the relative Rényi entropy D,,,(p|lo) produces the conditional Rényi 
entropies that is different from the conditional Rényi entropies by the relative 
Rényi entropy D,,;(p||o). Since each relative Rényi entropy produces two kinds 
of conditional Rényi entropies, we have four kinds of conditional Rényi entropies. 
These conditional Rényi entropies are linked to each other via the duality relation 
(Theorem 5.13). Firstly, Tomamichel et al. [42] showed Inequality (5.136). Then, 
Miiller-Lennert [44] and Beigi [43] independently showed Inequality (5.137). Finally, 
Tomamichel et al. [45] linked the remaining two kinds of conditional Rényi entropies 
as (5.138). Indeed, as shown by KG6nig et al. [41, Theorem 1], the limit of one of 
conditional Rényi entropies Hie. ,(A|B) has an interesting operation meaning as 
Lemma 5.8. 


5.9 Solutions of Exercises 


Exercise 5.1 When « is trace-preserving, (5.2) with Y = J implies that Tr XJ = 
Tr «(X)I = Tr XxK*(1) for X € T (Ha), whichimplies «*(/) = I. Conversely, when 
«* is identity-preserving, Tr K(X)/ = Tr X«*(J) = Tr XJ for X € T(Ha). 


x|a) 


K*(X)*|a) ) we have 


Exercise 5.2 Considering the vector ( 


x? (a|K*(X*X)|a) + 2x (alK*(X)K*(X)*|a) + (aln*(X)K*(X)*|a) > 0. 
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Since the discriminant is non-positive, we have (a|K*(X*X)|a) > (a|K*(X)&* 
(X)*|a), which yields (5.3). 

Exercise 5.3 Let Hp be Hc ® Hz, and consider the unitary matrix corresponding 
to the replacement W :u@vut> v @u in Hy, ® Hz, and define V = (W @Ic)U. 


Exercise 5.4 Since }7; ui,j, ; = 6),;', we have 


=EDws vEH, Fi = Derr 
i 


Exercise 5.5 


(a) Given a Choi—Kraus representation { F; } of the TP-CP map x, there exists a unitary 
matrix uj; such that all of non-zero matrices among F ; = Deu ;if; are linearly 
independent. 
(b) Due to the condition for Choi—Kraus representation, the map V : |x) > (F;|x)) 
from H, to Hg ® C4 is an isometry. Similarly, we define the isometry V’. Similar to 
the proof of Theorem 5.2, we find that the states V|®)(®|V* and V’|®)(@|V™ are 
earnae of K(|®)(®|). Hence, due to Lemma 8.1, there exists a partial isometry 
= (0;,;) from C4 to C” such that VV|®)(®|V*V* = V’|®)(®|V™. This relation 
om that Fi= » vik. 


Exercise 5.6 Since S;5;S* = —S; fori A j and S;S;S* = S;, we have oye SSS? 
= —S;. When p= 51+ sj 2 S;, we have 


ce + AY sost 


=A, ys si +5 ses Si 


3 3 
40 27S. i ees: 
— =a S;-1S* — = /§; 
ie ae 2d Pe 2 j 


3 
3A4+ 1 1-A [3 lw, 


p+ (27 — p) =Ap + — A)(Tr p)pmix 
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Exercise 5.7 Consider the Hilbert space 7(g produced by |w). Apply Condition © 
of Theorem 5.1 to the entanglement breaking channel ky,w given in Theorem 5.4 
with W,, = |w)(w|. Finally, consider the measurement {|w)(w| @ Ic}. 


Exercise 5.8 This relation follows from XgZq = wZgXq. 

Exercise 5.9 

(a) Since (X)Z5) (KX) ZK) = wi (XI TI"), we have 
d-1 d- 


y3 Sl 2 xh CSAS jue: 
=0 k=0 


wt’ (i 
eZ 


d-1 d-1 
1 


Fd Dy Xs Zi XK] Zi) |, 


j”'=0 k”=0 


where j" = j+ j’andk” =k+k’. 

(b) ZiAZz! = oj, a/*w!*|uj)(ug|. The relation ZiA = AZ, implies that 
Dik at ku—* lu 5) (ug| = Z,AZ;' =Az= Dik al*\u ;) (ugl. Thus, a/* =0 for 
J#k. 

(c) The j — 1-th diagonal element of Xy AX is the j-th diagonal element of A. The 
relation X;A = AXq implies that KyAX = A. Thus, all of the diagonal elements 
of A are the same. 

(d) Due to (b) and (c), (a) implies that (@ Te )) isa 
constant times of 7. Comparing the traces of both sides of (5.23), we obtain (5.23). 


Exercise 5.10 For any X and Y, we have 


d-1 d-1 
ra (X/.Zi, @ Ip)" p(X4 Zi, @ Ip)(X ®@ Y) 
j=0 k=0 
1 d—1 d-1 
=Te pay DY) DAZ @ In (XB Y)KAZ) @ In)" 
j=0 k=0 
1 d—1 d-1 
=Tr py, Ka IT* @ Ip)*(X @ Y)(X,/Z,* ® Ip) 
j=0 k=0 
1 d—1 d-1 
=Tep | DD Ky Za X(K/Z4) | @Y = Tr p(T X)pmix @ ¥ 
j=0 k=0 


1 
=(TrX)5 Tra (Tra p)¥ = Tr(p4.. @ Tra p)(X @Y). 


Exercise 5.11 Exercise 5.8 yields that 


242 5 State Evolution and Trace-Preserving Completely ... 


CS => 7 (XZ) XZ 


1 1 
= 12/3, x2, = rae x Se 
d d 


Exercise 5.12 Since the largest eigenvalue of W, is \ + 44 — He@-)a and remain- 
ing d — | eigenvalues are on we have 
1+(d-1)d d d—-1)1-A d 
min H(W,) = ai ) log ( d¢ ) log ; 
x d 1+(d—1)A) d (1 — A) 


This property holds for any input state x. 

When we choose the input distribution p as the uniform distribution on a 
basis {|u; ye ,- the input mixture state >), p(i)|u;) (wil = Pmix. Thus, H(pmix) = 
H (fmix) = logd, which attains max, H(W,). Therefore, the capacity is max, 
H(W,) — >, p(®)H (Wy) = log —(min, H(W,)) = +4 log(l + (d — 1)A) + 
DU» Jog(1 — 2). 

Since all of output states W),,,) (.,; Commutative with each other, the depolarizing 
channel Kg.) is pseudoclassical. 


Exercise 5.13 The map T®" is the transpose on the whole space H®". Thus, it keeps 
the positivity. Therefore, the transpose 7 is tensor product positive. 


Exercise 5.14 


(a) Choosing d;; := ae, pO, j)wi"-, we have 


d-1 
KS (p) = >. PO, A)(Zi)* pLi 
j=0 


d-1 
=>) PO, A) >) wi px slur) url = Do de. rpe,s|ue) (uel. 
j=0 kl kl 


(b) Assume that D satisfies (5.20). Then, p(0,m) = Tr DXi =7 5 ino od 
w J", Since 


d-1 . ; 
1 -imy m(l— k) = 1lifl-k= J 
Ad 0 ifl—k#j 
we have 
d-1 d—-1 = 
> po, m)w™"!* — =D5 me ow im ym l-®) gy es 
m=0 m= rh 


which implies that «9° = «7. This fact implies that «> is a generalized Pauli 
channel. 
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(c) It is enough to show that the channel re D is not a generalized Pauli channel if 
(5.20) does not hold. To show this fact, it is ‘suticient to show that the channel Kp 
is not a phase-damping channel when the condition given in (a) because there is 
one-to-one correspondence between generalized Pauli channels and phase-damping 
channels under the condition given in (a). When the condition given in (a) does not 
hold, the diagonal elements of KP (p) are different from those of the state p. Hence, 


we obtain the desired argument. 


Exercise 5.15 We focus on the input system 74 spanned by {|u; We Ge = the out- 
put system 1H, spanned by {|u; ie ‘o> and the environment system 7{, spanned by 
{|u; ?))4_o- Define the isometry U from H, to Hg ® He as eam: vj|uj)) = 


J/1 PL 9 v;|uj) @ |e) + /P > 0 ¥j|Ua) @ |us ). Then, we have K7",(p) = 


Tre uel. Thus, (ag)? (0) = Trg UpU* = K74_,- 


Exercise 5.16 Under the channel ee the environment system has the n — m- 


f : + |.pns 
particle system. Hence, the channel to the environment system Is Ky 5. 7m: 


Exercise 5.17 Due to Condition (5.30), 1 + A3 and 1 — 3 are non-negative, (5.31) 
implies that 1 + A3 > (A; + Az) and 1 — Az > 4(A) — Az), which implies (5.32). 


Exercise 5.18 Itis enough to consider the special case pp > p; > p2 => p3. Wecheck 
whether the eigenvalues of (5.34) satisfy (5.33). Since the condition pp > pi > p2 = 
p3 implies that po + pi — p2 — p3 = 0 and po — pi + p2 — p3 = 0, the condi- 
tion (5.33) is equivalent with (po + pi — p2 — p3) + (Po — Pi + p2 — p3) + (Po — 
Pi — p2+ p3) < land(po + pi — p2 — p3) + (Po — Pi + P2 — P3) — (Po Pi 
P2 + p3) < 1. These two inequalities are equivalent with po < 5 and p3 > 0. There- 
fore, we obtain the desired argument. 


Exercise 5.19 When we choose the coordinate wu; = uj! @ u?, wu =u} @ uz, u3 = 
1-A 001-2 

0 AO O 

0 OA O 
1-2, ? 0O1-A 
This matrix is positive if and only if (1 — )? — (1 — 2A)? > 0, iie., : >rA>0. 


us @ uP, us = us @ uF, wehave Inv ® tc2(|®2)(®2|) = 


Exercise 5.20 It is enough to consider the case when x = (x, 0, 0) and y = (y, z, 0). 


F (py, Py) = Tt y/ Px Pys/ Px 


/ \+x 1+ Zz 
=Tr 3 ( : 2 


0 lex 
\ 


= 4 a 
=r zi- (-yd-x) J 
\ 4 4 
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U+y)d+x)  zV/I=x? 


4 q 
zV1—x2— (I-y)(1—x) 
q 4 


The eigenvalues o( ) are the solutions of (C4*9 _ q) 


(anes a) = (& ry for a. The equation is simplified to a* — HY g + 
ly, ee) . : —y?—27)(1—x? 
ee) a *) = 0. The solutions ax satisfy a,a_ = ewe et and a, +a_ = 


l+xy 
Seer Thus, 


F(px, py) = (fay + Jay =a, +a_+2Jfaza 


1 1-y?—2)(1— x? 
eS y aa x) 


which equals the RHS of (5.35). 


Exercise 5.21 ForaPOVM M) := {M;};, we define the POVM &*(M) := {k*(M;)};- 
Then, Exercise 3.58 implies that 


& . 1 
H(s|pllo) = lim. — max (s|P yon PX.) 


1 = . a 
> lim — max 6(s|Pea Pea?) = d(s|«(p)||K(0)) 
n>o n M P 


fors < 0. 
Exercise 5.22 


(a) It follows from (5.55). 

(b) Exercise A.16 guarantees that the inequality (5.53) does not necessarily holds for 
s < —1 in general. However, it contradicts the conclusion of (a). Thus, by contra- 
diction, we can conclude that the equation ¢(s|p||o) = d(s|pllo) does not hold for 
s < —1 in general. 


Exercise 5.23 


I(p, W) = o(= P(x)Ix) (x1 @ Well (= posts ® Ws) 


=0( @k) (x p(x)|x)(x| @ w.) I @ «) (= poms @ w)) 


-o( pla) («| ® KW.) >| pOd|x) el) @ nity) = 1(p, 6(W)). 


Exercise 5.24 Equations (5.60) and (5.61) follow from (5.36) and (5.56), respectively. 
By substituting W, into a, (5.60) and (5.61) imply (5.62) and (5.63), respectively. 
By taking the infimum for o, (5.61) implies (5.64). Finally, by taking the infimum 
for p, (5.62) and (5.64) imply (5.65) and (5.66), respectively. 


Exercise 5.25 


(a) These inequalities can be shown by replacing o by 7-. 
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(b) Since the function x b» x~* is matrix convex for s € [1, 0], we have Pao *P > 
(PoP) ~*. Hence, we obtain (5.69) for s € [1, 0]. Similarly, for s € [—1, 0], we have 
PoP < (PaP)~™, which implies (5.69). 

Since Po is P > (PoP) is for s €(0,co), we have pio Th p2 > 
p2(PoP)~* p?. Hence, using (3.9) and Lemma A.13, we obtain (5.70) for s € 
(0, 00). Similarly, since Po~ 1 P > (PoP) for s € [—,0), using 3.39 and 
Lemma A.13, we obtain (5.70) for s € [-5. 0). 

(e) Since the function x +» x~* is matrix monotone for s € [1,0], we have o~* > 
(o’)~*. Hence, we obtain (5.75) for s € [1,0]. Similarly, o~* < (o’)~*. for s € 
[—1, 0], we have Po*P < (PoP) ™, which implies (5.75). 

Since oT > (o’)~™ for s € (0,00), we have pio Tp? > > pi (a) p?. 
Hence, using (3.39) and Lemma * 13, we obtain (5.76) for s € (0, oo). Similarly, 
since oT > (0')~ T fors € [- 5,0), using (3.39) and Lemma A.13, we obtain 
(5.76) for s € [-5, 0). 


Exercise 5.26 Let pmix be a completely mixed state in 7g. Consider the relative 
entropy D(p4,B,cllPmix ® Pa,c) and the partial trace of Hc. 


Exercise 5.27 The concavity of von Neumann entropy implies that 


Tp! + 1 — d)p*, W) — Al (p!, W) + A — ADI (p?, W)) 
=H (Wypi4a—»p2 — > Ap’ (x) + (1 — A) p?(@)) H(Wy), 


— MA (Wp1) — >) p'(@)A(W,)) — (1 — ACA (Wp) — >) pe) H(W,)) 
=H (W)pi4.c—nyp? _ AH (W,') -d- A) H (Wp?) = 0. 


Exercise 5.28 


(a) Use (5.80). Hence, D(p||Kkm(p)) = H(Km(p)) < log|M|. 
(b) Assume that p = >>. p(x)|x) (x|. The joint convexity implies that D(p||Km(p)) 
< Dy PO) D(x) (xl Ilka (1x)(x|)) < >1, PO) log |M| = log |M]. 


PiP\ 
0 
Exercise 5.29 Consider the state a inH, ® Hg ® He. 
0 Pk Pk 


Exercise 5.30 Let the purification of 4,5 be p4.g,.c for a reference state Hr. The 
subadditivity (5.86) implies that 


H(pa,p) — H(pa) + H(ps) = H(pc) — H(ps,c) + H (pp) = 0. 


Exercise 5.31 
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H,(AB|C) — H,(A|C) — H, (BIC) 
=H,(ABC) — H,(C) — H,(AC) + H,(C) — H,(BC) + H,(C) 
=H,(ABC) + H,(C) — H,(AC) — H,(BC). 


Exercise 5.32 


Ayu (A, C) 5 Ayu (A, D) ns Ayu (A) = Ay) (u\(B) 
=A) (A, C) ae Any ui (A, D) = Fu) (uj (A) = Ay u\(A, C; D) = 0. 


Exercise 5.33 Inequality (5.98) implies that H,(A) > H,(B) — H,(A, B). Thus, 
—H,(A|B) = H,(B) — H,(A, B) < H,(A) < logdg. 

Since H,(A) + H,(B) — H,(A, B) = D(p**||p4 ® p®) = 0, we have 
H,(A|B) = H,(A, B) — H,(B) < H,(A) < log dg. 


Exercise 5.34 


(a) ane) — —1—logx. Since 4@+9 sie U9) — _ log(x + ©) + log(©) < 0, the 
function n(x + €) — n(x) — 7(e) with the saris € > 0 takes the minimum with 


€ = 0. Since n(x + a n(x) — n(0) = 0, we obtain the desired argument. 
aie) — 


(b) Since “ oe) — — —1 < 0, 7(x) is strictly concave. —1—logx = 0 if and 
only if x = ‘ / e. Hence, it takes the maximum value qien x= I/e. 

(c) Since “e-9= na) 10-9+10) — log(a — €) — log(1 — ©) < 0, the function 
n(a — €) — n(a) — nC — ©) + 7(1) with the variable € > 0 takes the minimum with 
€ = 0. Since n(a@ — 0) — n(a) — nU — 0) + 71) = 0,7 we obtain the desired argu- 
ment. 

(d) Since a ues te ) — t+ ~ <0 for 0 <x < 1/2, the function n(x) — 
nC — x) is diriolly Concave for 0 < x < 1/2. The function n(x) — n(1 — x) takes 
the value 0 at x =0,} x. Due to the strictly concavity, n(x) — 7(1 — x) > 0 for 
O<x<1/2. 

(e): (d) implies that n(x) > 7(1 — x). Thus, n(a — 6) — n(a) < nU — ©) — 7) < 
n(e) — 7(1) = n(e). Combining this inequality and (a), we obtain (5.102). 


Exercise 5.35 


(a) Due to the condition e; < 1/2, (b) of Exercise 5.34 implies n(€;) > nC. — «;). (a) 
and (c) of Exercise 5.34 implies |7n(a;) — )(b;)| < max(7(e;), 71 — ¢;)). Therefore, 


d 


d 
|H(p) — H(0)| < 2 In(ai) — nb) < Do ne). 


i=1 i=1 


Combining (5.93), we obtain the desired argument. 
(b) Note that |7(a1) — n(b1)| < max(n(e1), 7 — €1)) < nC/e) = 1/e. Similar to 
(a), we have S'i5 In(ai) — n(bi)| < Dia nla). < € log(d — 1) + no(€). 
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(c) 1/e+ € log(d — 1) < (1/e+ €) logd < (€, + €’) logd = clogd. Since & < € 
< || — oll; and 7 is monotone increasing, we have 1/e + € log(d — 1) + no(e’) 
S lle — oli logd + no(lle — olla). 


Exercise 5.36 Since 7) is concave, we have 
I(p, W) = >> p(x)(H(W,) — H(We)) 


<>" p(x) (We — Wollilogd + no(l| We — Wold) 


< >> p(a)||We — Wplli logd + no @ p(x)||Ws — wt) 


x x 


=0 logd + 70(0). 


Exercise 5.37 


(a) Since the concavity (5.88) implies that H,(A|B) => (1 — ©)H,(A|B) + €H, 
(A|B). Hence, the inequality (5.101) implies that H,(A|B) — H,(A|B) < €(H, 
(A|B) — H;(A|B)) < 2€ log da. 

(b) It follows from the concavity (5.77). 

(c) It follows from the first inequality of (5.79), 

(d) (b) and (c) imply that 


H,(A|B) — H,(A|B) = H,(AB) — H,(AB) + H,(B) — H,(B) 
>e(H,(AB) — H;(AB)) + €(H5(B) — A,(B)) — h(e) 
=(H,(A|B) — H;(A|B)) — h(e) > —2«logd, — h(e). 


Exercise 5.38 


@—) d-eo+e=(1-do+eH(p—0) + 4\p-—o/) =A-Oo+ (1-9) 
(p—o)+|p—o|=U—-op+lp—ol=7. 
(b) Exercise 5.37 implies that 


|H,(A|B) — H,(A|B)| <|H,(A|B) — H,(A|B)| + | H,(A|B) — A, (Al B)| 
<4elogd, + 2h(c). 


Exercise 5.39 Exercise 5.38 and (5.92) guarantee that 


|1,(A : B) — I,(A : B)| = |H,(A) — H,(A) — H,(A|B) + H(A B)| 
<|H,(A) — H,(A) + |A,(A|B) — H,(A|B)| 
<elogd, + no(6) + 4€logd, + 2h(€) 
=e log d, + no(€) + 2h(E). 
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Exercise 5.40 


(a) 


[1,(A : BIC) — Ip(A : BIC)| 
=|H,(A|C) + H,(BIC) — H,(AB|C) — H, (AIC) — H, (BIC) + H,(ABIC)| 
<|H,(A|C) — H,(AIC)| + |Hp(BIC) — H,(BIC)| + |Hp(ABIC) — Ho(ABIC)|. 


(b) (5.104) implies that 


|H,(A|C) — H,(A|C)| + |, (BIC) — H,(B|C)| + |H,(ABIC) — H,(AB|C)| 
<4e log d, + 2h(e). + 4€ log dg + 2h(e€). + 4€ log dadg + 2h(e) 
=8e log d,dz + 6h(e). 


Exercise 5.41 


H,(AB|C) = H,(ABC) — H,(C) 
=H,(ABC) — H,(BC) + H,(BC) — H,(C) = H,(B|C) + H,(A|BC), 

I,(A: BC) = H,(A) + H,(BC) — H,(ABC) 
=H,(A) + H,(C) — H,(AC) + H,(BC) — H,(ABC) — H,(C) + H,(AC) 
=1,(A:C)+1,(A: BIC), 

I,(A: BC|D) = H,(AD) + H,(BCD) — H,(ABCD) — H,(D) 
=H,(AD) + H,(CD) — H,(ACD) — H,(D) 

+ H,(BCD) — H,(ABCD) — H,(CD) + H,(ACD) 
=1,(A:C|D)+1,(A: BICD). 


Exercise 5.42 


I,(A: B) = D(pagllpa ® ps) = D(Ka @ KB (pap) ||KA ® KB (pa ® ps)) 
=D(Ka @ KB(Pas)|lKA(PA) @ KB(pB)) = Iky@ngp(A : B). 


Exercise 5.43 It is sufficient to show [,(A : BIC) > Ivicsaugc)(p) (A : BC). Since any 
TP-CP map can be regarded as the application of an isometry and the partial trace, 
it is enough to show J, (A : B, B2|C) = I,(A : B,|C). Equation (5.109) implies that 
I, (A: By Bo|C) = 1,(A: BC) + 1,(A: Bs|CB,) = 1,(A: By|C). 
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Exercise 5.44 Since Tr p®” log 0®”" = Tr k= (p®") log c®", we have 


1 1 
D(p\lo) — — Dron (p®" Io") = — (D(pe"lio®") — D(Kyen(p®")||o®")) 
1 
=—(—H(p®") — Tr p®” loga™ + A (ke (p®")) + Tr Kye (p®”") log c®") 
n 
1 @n @n 1 
= 7 (A Kew (6 = Apr) S 7, 08 |Eoel, 


where the final inequality follows from (5.82). Since Lemma 3.9 guarantees that 
4 log | E,2| — 0, we obtain the desired argument. 


Exercise 5.45 Denote the input classical system X and the output system 1/4. Define 
the state py, := >°, p(x)|x)(x| @ W,. Denoting the TP-CP map from the quantum 
system to the classical system due to the POVM M by x’, we have 

I(p, W) = I,,,(% + A) = Nenpy,(X 1 A) = ICM, p, W). 
Exercise 5.46 

I(p,W) = 1,,,(% + A) = Nenpy,(X + A) = I(p, K(W)). 
Exercise 5.47 


logda — >> piH(p)) = >> pi(—H (0) — H(p?) + logda + H(p?)) 


= ps pi(—H (ps ® p?) + logd, + H(p?)) 

= > piD(p4 ® p} \lpnix ® oP) = (> PiP) ® 07 IIPrnix ® = net 
=- u(y Pip} ® ) - (> Pip} ® iP) (te Pini + we Dnt) 
= (Soe @ it) + logd, + u( vit). 


which implies (5.110). 
Exercise 5.48 Exercise 5.42 implies that 
Ai .@)(o) (Al B) = Aosanyp)(AlB) — Huson (A) + Avram (A) 


= — Tus@ny (A: B) + H(A) = —1,(A: B) + A, (A) 
=H,(A|B) — H,(A) + A, (A) = A,(A|B). 
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Exercise 5.49 The monotonicity ((a) of Exercise 5.25) with respect to the partial 
trace yields 


We 


A\)(A|B1) = —Da(pas, ||La ® pa,) 
= — Do(PaB, Bla @ PB, a.) = Aojp(A|B, Bo). 


can show other relations. 
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Chapter 6 
Quantum Information Geometry 
and Quantum Estimation 


Abstract In Chap. 3 we examined the discrimination of two unknown quantum 
states. This chapter will consider the estimation of a parameter 0, which labels an 
unknown state parameterized by a continuous variable @. It is a remarkable property 
of quantum mechanics that a measurement inevitably leads to the state reduction. 
Therefore, when one performs a measurement for state estimation, it is necessary to 
choose the measurement that extracts as much information as possible. This prob- 
lem is called quantum estimation, and the optimization of the measurement is an 
important topic in quantum information theory. In the classical theory of estimation 
(of probability distributions) discussed in Sect. 2.2, we saw that the estimation is 
intimately related to geometrical structures such as the inner product. We can expect 
that such geometrical structures will also play an important role in the quantum case. 
The study of geometrical structures in the space of quantum states is called quantum 
information geometry and is an important field in quantum information theory. This 
chapter will examine the geometrical structure of quantum systems and discuss its 
applications to estimation theory. 


6.1 Inner Products in Quantum Systems 


In any discussion about the geometry of quantum states, the metric plays a central 
role. To start talking about the metric, we must first discuss the quantum versions of 
the Fisher information and its associated inner product (2.95) examined in Sect. 2.2. 
Let A, B, p in (2.95) be the diagonal elements of the commuting Hermitian matrices 
Y, X, p, respectively. The inner product (2.95) is then equal to Tr Y (pX). Although the 
trace of a product of two matrices does not depend on the order of the multiplication, 
the trace of the product for three or more matrices is dependent on the order. If these 
matrices do not commute, then the inner product depends on the order of the product 
between p and X. At least, the product E,(X) should be defined by a linear map E,, 
satisfying the conditions 


Tr Y*E,(X) = Tr E,(Y)*X, (6.1) 
Tr X*E,(X) > 0, (6.2) 
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E,(U* XU) = U* Eypu«x(X)U, (6.3) 
E,(1) = p, (6.4) 
Ene’ .x(X &® @) = E,(X) ® Ey(X’), (6.5) 


which implies the following properties™**' 


Tr E,(X) = Tr pX (6.6) 
E pep (X @ 1) = E,(X)@ p’. (6.7) 


There exist at least three possible ways of E,,(X) to satisfy the above requirements. 


def def 1 
Eps(X) = Xop= 5 (eX + Xp), (6.8) 
_ pl 
E,»(X) = | pXp\ dr, (6.9) 
0 
def 
Ey ,(X) = pX. (6.10) 
Here, E,;, E,», and E,,,. are defined as maps on M(H). Here, X is not necessarily 


Hermitian. These extensions are unified in the general form [1]; 


Ii 

E,,p(X) = i E,,(X)p(dd), (6.11) 
0 

E,,(X) = Xp", (6.12) 


where p is an arbitrary probability distribution on [0, 1]. When p > 0, these maps 
possess inverses. The case (6.8) is a special case of the case (6.11) with p(1) = 
p(O) = 1/2, and the case (6.10) is a special case of the case (6.11) with p(1) = 1. 
In particular, the map E,,, is called symmetric when E,,(X) is Hermitian if and 
only if X is Hermitian. Hence, when the distribution p is symmetric, i.e., p(A) = 
p(1—A), the map E,, , is symmetric. These maps E,, , Satisfy Conditions (6.1)—(6.5). 
For example, when x = s, b, or 5, the map E,,,, is symmetric. 
Now, we define the following types of inner products: 


(e) def 


(Y, X)o, = Tr Y*E,x(X) x =s,b,r,A, p. (6.13) 


If X, Y, p all commute, then these coincide with definition (2.95). These are called 
the SLD, Bogoljubov,! RLD, \, and p inner products [1-8], respectively (reasons 


'The Bogoljubov inner product is also called the canonical correlation in statistical mechanics. In 
linear response theory, it is often used to give an approximate correlation between two different 
physical quantities. 
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for this will be given in the next section). Due to Conditions (6.1) and (6.2), these 
inner products are positive semidefinite and Hermitian™ *”, i.e., 


ys def 


(x2) = 20, we =a. (6.14) 


From property (6.3) we have 


SOx Yer s gS eye x, ioe (6.15) 


WIV ay eas le 
In particular, the SLD inner product and the RLD inner product satisfy 


IX @ bell = 1X Sor (6.16) 


Troy p.x? 


Generally, as is shown in Sect. 6.7.1, we have 


IX @ hy ll < xi? x=b,A, p. (6.17) 


px = Tryy/ px? 


From here, we assume that p is invertible. A dual inner product may be defined as 
(A, B)™ & Tr(E>1(A))*B (6.18) 


with respect to the correspondence A = E, ,(X). Denote the norm of these inner 
products as 


(AN)? & (A, Ay. (6.19) 


Hence, the inner product (A, Bye is positive semidefinite and Hermitian. In par- 
ticular, the inner product is called symmetric when (A, B)") = (B, A)” for 
two Hermitian matrices A and B. Similarly, the symmetricity is defined for the 
dual inner product (X, Y)‘¢). That is, the inner product (X, Y)() is called sym- 
metric when (X,Y) = (Y,X)©, for two Hermitian matrices X and Y. The 
symmetricity of the inner product (X, aie is equivalent to not only the sym- 


metricity of the dual inner product (X, Y)©., but also the symmetricity of map 


px? 
E,."*°*. When the inner product (A, B hie is symmetric, it can be symmetrized 
as (A, a wl 5 ((A, Bypie + (B, A)), ie., the symmetrized map E,,sx) is 


defined as E> oe =F 1 (g (A) + (E71(A))*) for any Hermitian matrix A. 
(m) 

p,5(x) 
(A, Bye . Note that the SLD inner product is not the symmetrized inner product of 


the RLD inner product. 


Hence, we call the inner product (A, B) the symmetrized inner product of 
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Similarly to (6.3), we have 
SOR 0 (6.20) 


for an arbitrary unitary matrix U. When a TP-CP map « and a state p satisfy that 
k(p) > 0, we also define the map k,,, associated with « and p by the relation 


K(E,x(X)) = Ex(p),x (Kp,x(X)), x=s,b,r,, P; (6.21) 


where for a non-Hermitian matrix A, «(A) is defined as K(A) = K((A + A*)/2) — 
ik(i(A — A*)/2). This map satisfies the associativity 


(Ky ° K2)p,x(X) = Kixo(p),x ? K2p,x(X), x=58,b,r, A, P. (6.22) 
Also, the relation 


(oY), 2) = eae) = WY Epa OO) = Yep), 


(6.23) 


holds for any Y. Since (6.23) can be regarded as a quantum extension of (2.109), we 
call the map «,,, as the conditional expectation with respect to the inner product 


x™°? Then, we have the following theorem. 


Theorem 6.1 The inequality 


JAN? > (ANIM, ., x= 5, BAP (6.24) 


holds. This inequality (6.24) is also equivalent to 


IX = WKpxQOMngyar X= 5B. 7A, p. (6.25) 
When an inner product satisfies property (6.24), it is called a monotone metric. 
Monotonicity implies that any operation does not increase the amount of information. 
That is, if the inner product is to be considered as a measure of information, this 
property should be satisfied because information processing does not cause any 
increase in the amount of information. It is also known that an arbitrary inner product 
|| All” satisfying property (6.24) and ||p~'||") = 1 satisfies ||A||") < ||AI" < 
Ae, i.e., the SLD inner product is the minimum product and the RLD inner 
product is the maximum product [3]. 

Before proving Theorem 6.1, we need to discuss how to extend the above discus- 
sion to the case when pis non-invertible. Even though pis non-invertible, the map E,, . 
can defined. However, it has a non-trivial kernel KC, , (71). So, the inner product (, ye 
is degenerate. Here, we introduce the quotient space M, ,(H) := M(H)/K,, (H). 
Then, the inner product (, )©. is non-degenerate in M px (FH). 


pix 
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Next, to discuss the other inner product ( , he , we focus on the image Mn) (H) 


of the map E,,,. Then, the inner product ( , Ve. can be defined on the space 
MM”) (71) as anon-degenerate inner product. For example, the space Me (7H) can be 
characterized by using the projection P, to the range of pas {X € M(H)|P,X = X}. 
In this case, the space Me (71) can be regarded as the set of representatives of the 
elements of the quotient space M,,,(H). If there is no possibility for confusion, 


Mo,x(H) and M\) (H) are abbreviated to Mp, and M\). 


Proof of Theorem 6.1 Here, we prove (6.25) for x = s, r. The general case of (6.25) 
is shown assuming inequality (6.17), which will be proven in Sect. 6.7.1. 

These inner products are invariant for the operations p > p@® po and pr UpU*. 
It is sufficient to show (6.25) in the case of partial trace because of the Stinespring 
representation and associativity (6.22). First, using property (6.16), we prove (6.25) 
for x = s,r. Letting « be the partial trace from system H @ H’ to subsystem H’, we 
have 


(Y @ I, Kpx(X) @ 1). = (Y, hip. (X)) try px = Te ¥*6(Ep.x(X)) 
=Tr(Y @ 1)*E,x(X) = (¥ @1, X)0. 


for any matrix X on H ® H'’, any matrix Y on H, and any state p on H @ H’. 
Hence, the map &,,x is the projection from the space of all matrices on H @ H' to the 


subspace of matrices {Y ®@ 7} with respect to the inner product ( , ee Therefore, 
[XI = UK XI, .- Hence, we obtain (6.25) for x = s,r. 
Next, we proceed to the general case, i.e., the case of x = p, b. Let F be the posi- 


: oo F ; : (e) 
tive self adjoint map on the matrix space with respect to the inner product (, )t,,, .x 
satisfying 


(IY OTe FY (6.26) 


Tryy px 


Since property (6.17) implies ||Y ® ||. = ||F?¥ lt), px S lI Ile, pe We have 


Tray p Tray px? 


= = ( 
(FY eH 1F re ce 2 


Try Tray p.x* 
Hence, 


FY O1.F p(X) ONO = © px pe = TY AE p(X)) 
IY OIE Oa Oly. 


Similarly, we can show that || (F7!Kp,.(X)) @ II, < ||X||(©.. Therefore, we obtain 


ox OM px < IXIS.. 


Troy px 
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Exercises 
6.1 Show (6.6) and (6.7) by using Conditions (6.1)—-(6.5). 
6.2 Show (6.14) by using Conditions (6.1) and (6.2). 


6.3 Show that the following conditions are equivalent. (Hint: Use Condition (6.1).) 


® The inner product (A, B)’”) is symmetric. 
@® The inner product (X, Y se is symmetric. 
@ The map E,, is symmetric. 


6.4 Prove the following facts for a traceless Hermitian matrix A and a density matrix 
p of the form p = pa A, Ej; and rank E; = 1. 

(a) Show that (x + y)/2 > Lm(x, y), where Lm(x, y) is the logarithmic average 
defined below. 


Lm(x, y) = 2 | logx= = ifx # y, 
ifx=y. 

Also show that the equality holds if and only if x = y. 

(b) Show the following [3]: 


d 
2 
Ae? = pare x, Te AEjAEx, 
ji k=l k 
1 
AW” = AeA. 
IAN = Do aaga, ney 


(c) Show that Tr (Ap — pA) (Ap — pA)* = 3°44, (Aj — Ae) Tr AE AE. 


(d) Show the inequality || A||”) > || Al]. Also, show the equivalence of the fol- 


p,b 
lowing. 
© NAN? = 1AI&?. 
@ [p, A] =0 


6.5 For the pinching ky of a PVM M, we define ky.p5 := (Km)p,s. Show the 
following facts. 

(a) For any matrix X, show that Ky,),;(X) commutes with every element M;. 

(b) Assume that ky(p) > 0. Show that ky,,5(X) = X if and only if every M; 
commutes with X. 

(c) Show that kyy,p,5 © KM,p,s = KM,p,s» 1-€., KM,p,s can be regarded as a projection. 

(d) Show that (Y, X)(°) ©) = (Y, kya es, , if every matrix M; commutes with Y. 
(e) Verify that the shoves is true for the RLD case. 


6.6 Show that the following two conditions are equivalent for the Hermitian matrix 
A, the state p > 0, and the pinching &y corresponding to PVM M = {M;}. 


6.1 Inner Products in Quantum Systems 259 


D WAI? = New (Ae? 


pss Ka (p),s* 


@ X := E-!(A) is commutative with M; for all i. 


ps 


6.7 Show the inequality Ale? > |lku (Ae y.s with the same assumption as 


above. Also, show the equivalence of the following: 

D AN? = lm (Aer s- 

@ There exists a Hermitian matrix X such that it commutes with every M; and 
satisfies A = pX = Xp. 


6.8 Show that || XpY ||, < VTr pYY*/Tr pX*X by the Schwarz inequality for the 


inner product (X, Y he “Tr pY X*, where X,Y are matrices and p is a density 


matrix. Note that || - ||; denotes the trace-norm (Sect. A.3). 


6.9 Given a matrix X and a density matrix p, show that 


Xi < VTr p-!XX*./Tr pU*U = J Tr p-!XX*, (6.27) 


where U is a unitary matrix satisfying || X ||; = Tr XU. 


6.10 Given a matrix X and a density matrix p, show that 


[Xl < VTrp 2X p12 xX*V/Tr p\?2U*pl?2U < J Tr p-/?Xp-!/?X*, (6.28) 


where U is a unitary matrix satisfying || X ||; = Tr XU. 


6.11 Assume that the distribution p has zero measure at \ = 1, 0. Let p be a pure 
state | y) (y|. For the equality condition of the inequality (6.17), show the following. 
The equality of the inequality ||X @ Hy ||), < ||X||, holds with Trjy p = o if and 
only if X is a constant times of /. 


6.12 Let « be the pinching Ky of aPVM M = {M;}. Define ky, as the same 
as Exercise 6.5 for x = s,r. Show that the map Ky, can be regarded as the 
conditional expectation to the matrix subspace {X|[X, M;] = 0 Vi} for x = s,r. 
That is, show that the map «y,,,, 1s the dual map of the inclusion of the matrix 
subspace {X|[X, M;] = 0 Vi} for x = s,r. (In general, the conditional expectation 
can be defined by (2.110) when the map « is the dual map of the inclusion of a matrix 
subspace U/.) 


6.13 Show that £,, is a map from the set of Hermitian matrices to itself for x = 
S, s, b. 
6.2 Metric-Induced Inner Products 


In this section we treat the space of quantum states in a geometrical framework. In 
particular, we will discuss the properties of the metric, which will be defined in terms 
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of the inner product discussed in the previous section. Consider a set of quantum states 
{p9|9 € R} (astate family) parameterized by a single real number 0. We also assume 
that 9 +> pg is continuous and differentiable up to the second order. The metric then 
represents the distance between two quantum states pg,, P9,+< Separated by a small 
€ > 0. The difference in this case is approximately equal to ee (89). When we focus 


(m) ) 
.In 


particular, the SLD Fisher metric Jo, , is defined as the “ee of the size of “ee (90) 


on the norm || rae the Fisher metric Jg,,, is defined to be (ls 


: m) \? 
based on the SLD inner product at pag,, i.e., Jos = (| aoe (9) ”) . The norm of 
Py +8 


the difference between two quantum states pg and p+, is then approximately \/ Jg,,s€. 
We can obviously define quantities such as the Bogoljubov Fisher metric Ja, , 
[1, 3, 4, 8], the RLD Fisher metric Jo, - [2, 7], and the p metric in a similar way for 
the Bogoljubov, RLD, and p inner products, respectively. Therefore, if u,,..., ux is 
an orthonormal basis in 71, the SLD, Bogoljubov, RLD, and p Fisher metrics of the 


state family {9 e Si po(i)|u;)(u;||@ € R} are all equal to the Fisher metric for 
the probability family {po}. 
Thus, we have a theorem equivalent to Theorem 6.1 as given below. 


Theorem 6.2 Let « be a TP-CP map, and Jo,x,, be the x = s,b,r,, p Fisher 
metric for the state family {K(p9)|@ € R}. The following relation then holds: 


J60.x ca J 6.x, x= S, b, l, A, —p. (6.29) 


When a metric satisfies (6.29), it is called a monotone metric. Pascal the derivative 
eo 7 (90) plays an important role in the definition of the metric, eo Pb (M9) will be called 
the m representation of the derivative. We shall also define an operator La,,, by the 
relation 


dpo 
x (Loy,x 0 . 
Epix L6,x) = 79 0). (6.30) 


Such an operator is called the e representation of the derivative. If all the density 
matrices pg commute each other, the e representation is the same as a logarithmic 
derivative. On the other hand, if some of the density matrices pg do not commute 
each other, their logarithmic derivatives depend on the metric. The matrices Lo, 
and Lg,,, defined by 


d d 
Oo) = 5 (604 Lt52 + Leys) Fy 0) = PasL tr (6.31) 


are called the symmetric logarithmic derivative (SLD) and the right logarithmic 
derivative (RLD), respectively. These matrices coincide with the e representations of 
the derivative concerning the SLD Fisher metric and the RLD Fisher metric, which 
are abbreviated to SLD e representation and RLD e representation, respectively. 
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Since the equation 


fr d log po 
ee 


holds [8]**°'’, the e representation of the derivative of the Bogoljubov Fisher metric 
Lo,,p 1s then equal to et) (99). Since Tr 460 = 0, the e representation Lg, satisfies 
Tr pLo.x = Tr E,(Lo,x) = 0. 


- d 
po dks <7 (6.32) 


0=00 dé 0=00 


Theorem 6.3 For a quantum state family {p9|0 € R}, the following relations hold 
[9-12]: 


_ B?(p9, pore) 
m es 


g 0.8 = lim a (6.33) 
1 D(pe+ellpo) 
5160 = lim ———. (6.34) 


Hence, we obtain another proof of Theorem 6.1 (Theorem 6.2) for the SLD 
(Bogoljubov) case by combining Theorem 6.3 and (5.49) (5.36). 


Proof Define U, such that it satisfies 


1 
b (po, Pore) = 5 Te /po — /poreUc) (Po — V/p0rUe)*. 
This can be rewritten as 


2b* (po, pore) = Tr(W (0) — W(e))(W(0) — W(e)* 


dW dw \*_ dW dW, 
=1r/ re (re) ( 7 (or) = Tr O—_ ore, 


where we defined W (e) = /po+eU,. As will be shown later, the SLD Lg, satisfies 


dw 1 
ae = pew) (6.35) 


Therefore, b?(p9, poz<) = Tr gLW(0)W(0)*Le? = 7 Tr L7pge, and we obtain 
(6.33). Thus, showing (6.35) will complete the proof. 
From the definition of the Bures distance, we have 


2b*(p9, Pore) = min Tr(./p9 — /p0+eU)(/ p09 — s/Po4eU)* 
u:unitary 


=2 — Tr (/p9/p04<U (©)* + U(©)/Porer/ 0) « 


Therefore, ./p9./po+<U (€)* = U (€),/po+e./po- Hence, W(0)W(e)* = W(e)W(0)*. 
Taking the derivative, we obtain W (0) aw (0)* = a” (0) W (0)*. This shows that there 
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is a Hermitian ee L satisfying 4” YO) = 5LW(0). Since po+. = W(e)We)*, 
we have 409) = = 5 (LW (0)W(0)* i 'W(O)W(0)*L). We therefore see that L is an 
SLD. 

We now prove (6.34). Since Lg, is equal to Hcg (9), we have 


D(fo+ell Po) = Tr (p9+< (log po+e — log pp)) 


dpo dlogpg | 1d? logps 5 
a7 
«(on an) ( do (12 ap < 


dpo 1 d” log pp\\ 5 
=Tr(poL Ti L Tr : 6.36 
(po oer( (ae! ea) +3 (0 102 )e (6.36) 


The first term on the right-hand side (RHS) may be evaluated as 


dpo 
Tr (poLo,n) -[ Tr (pLo,op9 ') dt = 1 ( a) = 0. (6.37) 


Using this equation, we obtain 


Tr d? log po _ d Tr d log po Tr dpo d log po 
pe) dN NO a8 do do 


ee ee ee (6.38) 
= do 88 | = 8b . 
Combining (6.36)-(6.38), we obtain D(9+<||p¢) = 5 Joye. a 


Next, let us consider a quantum state family { pal € R*“} with more than one para- 
meter. The derivative at the point 0) = = (6p. ..., 4) may be obtained by considering 
the partial derivative aor lo=bos «++ ae » 397 |o=0 with respect to each parameter. Since 
each partial derivative represents the size and direction of an infinitesimal transport, 
it may be regarded as a vector. We then call the vector space comprising these vec- 
tors the tangent vector space at 4, and its sige tangent vectors. The tangent 
vector a |e=9, can be represented as a matrix 2 oF £4 (Qy). This kind of representation 
of a tangent vector will be called an m representation. The matrix Lg, ;,. satisfying 


E py,.x (Lo, jx) = ae (89) will be called an e representation of the SLD (Bogoljubov, 


RLD) Fisher metric of ah lo—a- The matrix J 9. = [Jo,x:i, Ji, 


eto a) m” 
Jays a a 2oa))" (6.39) 


Ob! feat 
is called the SLD (Bogoljubov, RLD) Fisher information matrix [2, 7, 8], where 
x = s,b,r corresponds to SLD, Bogoljubov, RLD, respectively. Note that the tan- 
gent vector refers to an infinitesimal change with respect to 0 and is different from 
the matrix represented by the m representation or e representation. The m represen- 
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tation and the e representation are nothing more than matrix representations of the 
infinitesimal change. 

In summary, in this section we have defined the metric from the inner product 
given in Sect. 6.1 and investigated the relationship of this metric to the quantum 
relative entropy D(p||c) and the Bures distance b(p, 7). We also defined three types 
of Fisher information matrices for state families with more than one parameter. 


Exercises 


6.14 Define dg = ae. — (do| AO) by with respect to the pure state family {pg = 


|é9)(be|}. Show that the SLD Fisher information Jy, is equal to A(doldo). Show that 
both the RLD Fisher information and the Bogoljubov Fisher information diverge. 


6.15 Let i be the Fisher information of the probability family {Py |6 € R} 
(Sect. 1.2) for a one-parameter state family {p9|9 € R} and a POVM M = {M;}. 
Show that 


Jox = JM forx = 5,7, b, d, p. (6.40) 


M,, Lo 5) .(Lo5, M;)© 
Mi Fon) pash - pms , with respect to the POVM 
(Mi ’ T) 5,8 


6.16 Show that J!” = >> 
M = {Mj}. 
6.17 Show the following facts with respect to the PVM M = {M;} of rank M; = 1. 
(a) Using Exercise 6.12, show that K,p,5(X) = >¢; (Mi, XM). 


(b) Show that J” = (|lKat,pp,s (Los) y’. 


po. 
(c) Assume that pp > 0. Show that Jo. = If if and only if every M; commutes 


Los. 

6.18 Prove (6.32) following the steps below. 
! 'm! 

(a) Show that [ MU-N"dA = 


0 (n +m)! 
(b) For a matrix-valued function X (9), show that 


dX (0) _ dexp(X (@)) 
70 exp((1 — A)X(0)) dA = a, 


1 
ii exp(AX (8)) 
0 


This is nothing other than (6.32). 


6.19 Consider the state family { lg | € R} consisting of the n-fold tensor product 
state of the state pg. Show that the metric Jo... of this state family { pe" |0 e€ R} is 
equal to n times the metric Jy, of the state family {p99 € R}, ie., Jox.n = nJo,x for 
x=s,r,b, X, p. 


6.20 Show that the Fisher information matrix J, is Hermitian. 


6.21 Show that the Fisher information matrix J, is real symmetric for x = s, 5, b. 
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6.22 Give an example of an RLD Fisher information matrix J, that is not real 
symmetric. 


6.23 Fora Hermitian matrix Y and a quantum state family {py = e7'°" pe?’ |@ € R}, 
show that the derivative at 9 = 0 has the e representation i[log p, Y] with respect to 
the Bogoljubov metric. 


6.24 Show that i[p, Y] = E,,(i[log p, Y]) if Y is Hermitian. 


6.25 Define the state family S = { p0 =1 ( 4s 6,8") | < | on the 


two-dimensional system H = C?’. Show that the three Fisher information matri- 
ces Jos, Joy, Jo, can be written as 


Jo, =1— (0) (4), (6.41) 


1 1 1+ ||| 1 
Jo» = ——10)(6| + log I 1a)(|), (6.42) 
60 T= 62 261 ~1—(6l le 


Jo). =1—|0)(0| +iRo, (6.43) 


where R is defined in Sect. 5.3, following the steps below. 
(a) Show the following for 0 = (0, 0, 6), and check (6.41)-(6.43) in this case using 
them. 


Leer = Sy. Loge = Sy Lari = 65 Se Lope = By 80 


1 1+6 1 1+0 
Lb = — log Si, L log —— Sp, 
0,b,1 0 8T_ 6"! 0,b,2 = oY] STA” 
: 0 
Los,3 = Loo3 = Lor3 = ar oan 
10 0 0 
FS) 01 ne oe o |, 
a 
1 Jog tt 0 0 
0 
Jon = 0 mNEra f 0 
0 0 rae 


(b) Show that O? JogO = Jg, where O is an orthogonal matrix. 
(c) Show (6.41)-(6.43) for an arbitrary 6. 


6.26 Let ‘he and Is be the Fisher metric of two state families {p4l0 € R} and 
{p30 € R} forx = s,b,r, X, p, respectively. Show that the Fisher metric Jp, of the 
state family {Apy + (1 — A)pql0 € R} satisfies Jo. < AJy, + 1 — N Ip Show 


that its equality holds when the space spanned by the supports of ep 7p and Pa are 


orthogonal to those of a0, 7p and Pre 
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6.3 Geodesics and Divergences 


In the previous section, we examined the inner product in the space of the quantum 
state. In this section, we will examine more advanced geometrical structures such as 
parallel transports, exponential family, and divergence. To introduce the concept of a 
parallel transport, consider an infinitesimal displacement in a one-parameter quantum 
state family {p9|6 € R}. The difference between 9+, and pp is approximately equal 


to 400 (Ye. Hence, the state pp,. can be regarded as the state transported from the 


state pg in the direction Ae (8) by an amount e. However, if the state pg,- coincides 
precisely with the state displaced from the state pg by «€ in the direction of aes (9), 
the infinitesimal displacement at the intermediate states pg_- (0 < «’ < €) must be 
equal to the infinitesimal displacement 400(B) A at @. In such a case, the problem 
is to ascertain which infinitesimal displacement at the point @ + ¢’ corresponds to 
the given infinitesimal displacement 4006) A at the initial point #. The rule for 
matching the infinitesimal displacement at one point to the infinitesimal displacement 
at another point is called parallel transport. The coefficient 460 (9) of the infinitesimal 
displacement at @ is called the tangent vector, as it represents the slope of the tangent 
line of the state family {~9|9 € IR} at 6. We may therefore consider the parallel 
transport of a tangent vector instead of the parallel transport of an infinitesimal 
displacement. 

Commonly used parallel transports can be classified into those based on the m 
representation (m parallel translation) and those based on the e representation (e 
parallel translation). The m parallel translation [7 ate moves the tangent vector at 
one point fp to the tangent vector at another point py with the same m representation. 
On the other hand, the e parallel translation [7 ie ».p Moves the tangent vector at one 
point pp with the e representation L to the tangent vector at another point py with the 
e representation L — Tr pg L [8]. Of course, this definition requires the agreement 
between the set of e representations at the point 6 and that at another point 6’. 
Hence, this type of e parallel translation is defined only for the symmetric inner 
product (X, ey and its definition depends on the choice of the metric. Indeed, the 
e parallel translation can be regarded as the dual parallel translation of the m parallel 


translation concerning the metric (X, aa in the following sense: 


Te X*IT, (A) = TIT, COPA, 
where X is the e representation of a tangent vector at py and A is the m representation 
of another tangent vector at pg. 

Further, a one-parameter quantum state family is called a geodesic or an autopar- 
allel curve when the tangent vector (i.e., the derivative) at each point is given as a 
parallel transport of a tangent vector at a fixed point. In particular, the e geodesic 
is called a one-parameter exponential family. For example, in an e geodesic with 
respect to SLD {p9|@ € R}, any state yg coincides with the state transported from 
the state pp along the autoparallel curve in the direction L by an amount 6, where L 
denotes the SLD e representation of the derivative at 9. We shall henceforth denote 
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the state as IT re po. Similarly, IT a ,Po denotes the state transported autoparallely with 
respect to the Bogoljubov e representation from po in the direction L by an amount 0. 

When the given metric is not symmetric, the definition of the e parallel translation 
is more complicated. The e parallel translation moves the tangent vector with the 
é — L at one point 6 to the tangent vector with the e representation 

—Tr py L’ at another point 6’ so that the condition L+ L* = L’+(L’)* holds. That 
is, we require the same Hermitian part for the e representation at the different points. 
Hence, the e parallel translation a. py COincides with the e parallel translation 

ca with regard to its symmetrized inner product s(x). 

Therefore, we can define the state transported from the state po along the autopar- 
allel curve in the direction with the Hermitian part L by an amount @ with respect to 
RLD (), p), and we denote them by IT}, ,.po CT? y pos TT}, Po): However, only the 
SLD one-parameter exponential family ean pols € R} plays an important role in 
quantum estimation examined in the next section. Notice that since the symmetrized 
inner product s(r) of the RLD r is not the SLD s, IT? po is not the same as TT}, ,po- 


Lemma 6.1 Hit os Hf 4o, Te a: and He 19 €an be written in the following form 
[8, 13, 14]: 


Ue e'= es Oe ge rh (6.44) 

m1! ,0 = e Hol) glogo+OL (6.45) 

te ese OO Jae a, (6.46) 
—yi(0) 1 Shy 1 Shy 1 

i 10 =e MO gier hgie? 204, (6.47) 


oe : al 1 1 _ 
where we choose Hermitian matrices L, and Li as L = $(o 2L,02 +02?L,a07 2) 
2 
1 1 1 1 a 
and L = s(0-4Lias +oa%4Lio 4), respectively, and 
2k, 


us(0) = log Tre?“ce pp(0) & log Tr eet, (6.48) 


u,(0) = log Tr Jae™ Ja, 1112(0) = log Tr ate 


Proof When x = s,b, or 5, the map E,, , is symmetric. Hence, the definition of 
i, implies that 


aT? 0 


1 
FO = Exh — TL ps), x= 8,b, 5. (6.49) 


Since the equation (6.49) is actually an ordinary differential equation, the uniqueness 
of the solution of the ordinary differential equation (6.49) guarantees that the only 
IT, g ,¢ satisfying IT . 3 .o = ais the solution of the above differential equation. Taking 
ne dstivative of the RES of (6.44), (6.45), and (6.47), we see that the RHS satisfies 


(6.49) for x =s,b, 3, respectively. So, we obtain (6.44), (6.45), and (6.47). 
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Since the RLD inner product is not symmetric, we need to more careful treatment 
for the case of x = r. In a one-parameter exponential family pp = Te for 
the RLD metric, any the RLD e representation at pg is written as 1. which is not 
necessarily Hermitian. Since EF, .(L) is Hermitian, oL is Hermitian. So, L is written 
as Jo 'LJo with a Hermitian matrix L. As the RLD e representation at o has the 


form /o Lo with a Hermitian matrix L, we only discuss pg = To Sarr 
By taking its derivative, we have , 
TT? _,. (ol 
o Lor —1 = —ls 
EN ida “Lal = Ti pala La/e (6.50) 


dé 


On the RHS of (6.46), the RLD e representation of the derivative at each point is 
equal to the parallel transported e representation of the derivative Jo ' Lo ato. 
So, we find that only the state family (6.46) satisfies this condition. Similarly, the 
uniqueness of the solution of the ordinary differential equation (6.50) guarantees that 
only the state family (6.46) satisfies this condition. So, we obtain (6.46). | 


Now, using the concept of the exponential family, we extend the divergence based 
on the first equation in (2.129). For any two states p and a, we choose the Hermitian 
matrix L such that the exponential family {17} .o}oet0,1) with regard to the inner 
product Jp , satisfies 


eh (6.51) 


Then, we define the x-e-divergences as follows: 
1 
D (plo) = } Jn xd, (6.52) 
0 


where Jo, is the Fisher information for the exponential family Te aa, Since we can 
show that 


ae (6.53) 


(For x = b, see (6.32).), D® (p||7) can be regarded as the Bregman divergence of 
Hx (9), i.e., 


D (pila) = DOL). (6.54) 
Since Th ig141912,.(01 ® 72) equals UIT}, 01) ® (Tf 02), 


DO (p1 @ pallor @ or) = DY (pillor) + DY? (pallor), (6.55) 


i.e., the e-divergence satisfies the additivity for any inner product. 


268 6 Quantum Information Geometry and Quantum Estimation 


Theorem 6.4 When 


2log a3 (a2 po2)2072 forx=s, 
1 —] = b, 
L= 3 cae aii ot 1 ees eee dae (6.56) 
x(a 2 log(a” 2 pa" ?)o02 + 02 log(a?po"?)o" 2) forx =r, 
ona log(a~ipia-4)ot +04 log(a-ipia-4)on4 forx = 5, 
condition (6.51) holds. Hence, we obtain 
D® (pli) — 2 Tr plog a 3(02po?)207?, (6.57) 
Dy. (olla) = Tr p(log p — logo) = D(pllo), (6.58) 
D® (plo) = Tr plog(p207'p?), (6.59) 
D® (pil) = 2Tr(a4 p204)(a7 4 p2a74) log(a 4 p2074). (6.60) 
2 


Proof When we substitute (6.56) into L, condition (6.51) can be checked by using 
1 1 1 1 1 

Lemma 6.1. In this case, L, = log(o”~2po 2), Li = 2log(o-4p20 4). From 

(6.54), we can prove that 


dx (0) dix (0) 
(e) = a 
Dy (pllo) = 0 bai (1 — 0) — pC) + px (0) = 10 7 , 


where [Ux (0) is defined in Lemma 6.1. Using this relation, we can check (6.57), (6.58), 
and (6.60). Concerning (6.59), we obtain 


D® (plo) = Tr oo? po“? log(o~? po~?) = Tr plog(p?07'p?), 
where the last equation follows from Exercise A.2. | 


Now we compare these quantum analogs of relative entropy given in (6.57)—(6.60). 
As is easily checked, these satisfy condition (3.104) for quantum analogs of relative 
entropy. Also, Inequality (5.47) shows that 


11 


D(pllo) = De(pllo) = 2Tr plog a3 (a2 pa?)?07? = D (pli). (6.62) 


Alternative proof of the above relations is available in Exercise 6.30. Hence, from 
inequality (3.107) and additivity (6.55), D‘(p||c) do not satisfy the monotonicity 
even for measurements because the equality of the first inequality in (6.62) does not 
always hold. 

Further, we can extend the divergence based on equation (2.136). For any two 
states p and o, the family {(1 —t)p+ta|0 < t < 1} is the m geodesic joining p and 
a. Hence, as an extension of (2.136), we can define the x-m divergences as 


1 
D™ (pil) = i Jnxtdt. (6.63) 
0 
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Since the family {(1 — t)K(p) + tK(o)|0 < t < 1} is the m geodesic joining K(p) 
and «(o) for any TP-CP map «, we have 


D®” (pil) = D2? (K(p)IIK()), (6.64) 


i.e., the m divergence satisfies the monotonicity. Since the RLD is the largest inner 
product, 


Dy” (pllo) = DY” (plo). (6.65) 


We can calculate the m divergences as**°” 


D;" (plla) = Tr pllog p — logo) = D(pllo), (6.66) 
D™ (pllo) = Tr plog(./po~'./p). (6.67) 


The Bogoljubov case follows from Theorem 6.5. Hence, from (6.64), Tr p log 
(/po- : Jp) = D®™ (p||~) satisfies the monotonicity for TP-CP maps. Further, from 
(6.65) we obtain Tr plog(./po—'./p) > D(p||o) [15]. 

Not all x-m divergences necessarily satisfy additivity (6.55). This fact can be 
shown as follows. Choose an inner product J, 9 different from the Bogoljubov inner 
product J;.9 such that Jo. < Jo.So, we have D(p||c) => D‘” (p||c). Further, since 
inner product J,..9 different from the Bogoljubov inner product J; 9, there exists a pair 
of states p and o such that D(p||o) > D® (plc). From (3.107) and monotonicity 
(6.64), D®™ (plo) does not satisfy additivity (6.55). For example, since SLD Fisher 
information satisfies the above conditions, the SLD m divergence does not satisfy 
additivity (6.55). 

Now, we consider whether it is possible in two-parameter-state families to have 
states that are e autoparallel transported in the direction of L; by 0! and in the 
direction Ly by 6”. In order to define such a state, we require that the following two 
states coincide with each other. (1) the state that be e autoparallel transported first 
in the L direction by 6! from po, then further e autoparallel transported in the Ly 
direction by 6”, and (2) the state that is e autoparallel transported in the L» direction 
by & from po, then e autoparallel transported in the L, direction by 0'. That is, if 
such a state were defined, the relation 


He A? c= HH? nt 0 (6.68) 
should hold. Otherwise, the torsion T(L,, L2), is defined as follows (Fig. 6.1): 


TT¢ 


Ly,x 


Ty, xP ~~ TT; AT, xP 
5) : 


def ,. 
T(L1,L = | 
( 1, 2)p,x ma € 


Concerning condition (6.68), we have the following theorem. 


Theorem 6.5 (Amari and Nagaoka [8]) The following conditions for the inner prod- 
uct Jo, are equivalent. 
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Torsion exists Torsion-free 


Fig. 6.1 Torsion 


Jo,x is the Bogoljubov inner product, i.e., x = b. 

Condition (6.68) holds for any two Hermitian matrices L, and Lz and any state 
Po- _ 

D® (p4l|po) = D¥ (69). 

D (p|lo) = D(pllo). 

D&” (pqllPn) = D” ll). 

DY” (plo) = D(pllo). 


SG8O 89S 


Here, the convex functions 1(0) and v(n) and the states pg and p,, are defined by 


pp = exp (= Ox, = 7) (0) © log Tr exp (= ox) (6.69) 


t 


def ; def 
Pn = pik SB >> nj Y/ ’ y(7) = De (Poll Pn) = —HA (py) aI A (pmix), 
J 
where X,,..., Xx is a basis of the set of traceless Hermitian matrices, and 
Y!,..., Y* is its dual basis. 


Proof First, we prove D=>©. Theorem 6.1 guarantees that the Bogoljubov e autopar- 
allel transport satisfies 


4 


6! _ og! 0 — 4 —le(0!,07) jlog p-+0!L, +07 Lo 
IT, tT 1, oP = Ty, pfl 7,bP = & é , 


def ; 
where pp(9) = log Tr e!8°+8'4i+L2_ Hence, we obtain ©. 


Next, we prove that @=>©. We define pg = He see oATe pis for 9 = 
(6',..., 0*). Then, condition @ guarantees that pj = Ty @_ ex, PO In particular, 


when 6 = 0, we obtain pj = us pix, ,Pmix. Since >’; 6X; is commutative with 
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Pmix, we can apply the classical observation to this case. Hence, state pj coincides 
with state pg defined in (6.69). 

Let X; ;,9 be the x-e representation of the partial derivative concerning 6/ at po. It 
can be expressed by using a skew-Hermitian matrix X 7,9 as 


X jo = Xj — Tr poXj + Xjo. 


Thus, 


OTr poX; a a 
ea ot | oe (— eS eee 
agi 
=Retr (ex; — Tr ppX; + Xj, ) = Re Jy.x:i,;. 


Note that the trace of the product of a Hermitian matrix and a skew-Hermitian matrix 
is an imaginary number. Since Re Jo,x.;,; = Re Jo,x:;,;, we have oT on) = ane 
Thus, there exists a function (0) such that (0) = (0) and 

On) 


——— = Tr poXj. 
oe ee 


This function ji satisfies the condition ©. 

Moreover, since Tr pmix X; = 0, from definition (2.116), we have j4(@) — (0) = 
D#(0||9). Since the completely mixed state p pix commutes with the state p,, the 
relation D® (pmix||p9) = (0) — (0) holds. Hence, we obtain ju(0) = s1(0). 

Further, we have D“ (6 |@) = D(p||@). Thus, the equivalence between © and @ is 
trivial since the limit of D(p ||) equals the Bogoljubov inner product J, 9. Hence, 
we obtain @>©. 

Now we proceed to the proof of O+@+@+@=>6©. In this case, the function 
y(n) coincides with the Legendre transform of (6), and 7; = ay (0). Hence, 


oy 
D’(y\ln) = D#(6||0) = D(pj% || pn). The second el matrix 


ae Dy = coincides 


with the inverse of the second derivative matrix anor sai , which equals the Bogoljubov 
Fisher information matrix concerning the parameter @. Since the Bogoljubov Fisher 
information matrix concerning the parameter 7) equals the inverse of the Bogoljubov 
Fisher information matrix concerning the parameter 0, the Bogoljubov Fisher infor- 
as matrix concerning the parameter 77 coincides with the second derivative matrix 
Bobat 3n ;- Hence, the relation (2.118) guarantees that D’ (n||7) = D}” (pallpo)- 

Next, we prove @=>©@. Since pmix = po commutes with p,, the m divergence 
D (po||py) coincides with the Bogoljubov m divergence D}”” (po|| 0,,), which equals 
the Legendre transform of ,:(0) defined in (6.69). Thus, pD™ (PrllPn) = D’ (nll = 
D (pj || Pn). Finally, taking the limit 7 > 1, we obtain J, = Jp, i.¢.,©>0. Mf 
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This theorem shows that the Bogoljubov inner product is most natural from a geomet- 
rical point of view. However, from the viewpoint of estimation theory, the Bogoljubov 
metric is rather inconvenient, as will be shown in the next section. 

In summary, this section has examined several geometrical structures that may be 
derived from the inner product. In the next section, we will discuss the connection 
between these structures and estimation theory. 


Exercises 


6.27 Define the SLD L := 8S, and the state pp := 1/2U + pe in the 
two-dimensional system H = C?. Show that the SLD e geodesic [7 1..sPo is given by 


e(1+x1) +e*(1 — x1) 


2x2 2X3 
X2(t) = = = » B(t)= = = . 
e(1+x1)+e'd — x}) e(1+x) +e — x) 


3 

1 . re ee 

Tim =3(1+ x05), aes *1) — e"¢ ay 
i=1 


6.28 Show that an arbitrary SLD e geodesic on the two-dimensional system H = 


: fo ; : . : def 
C? is unitarily equivalent to S, if a suitable a € [0, 1] is chosen, where S, = 


1 f1+a/cosht  tanht 
{5 ( tanh t¢ a ae) reR| L¥6]. 


6.29 Show equation (6. be following the steps pak 

(a) Show the equation Fs (X74) + tX)-'dt = X — log(/ + X) for any Hermitian 
matrix X. 

(b) Show the equation fo Tr(o — p)*(p + t(o — p))!dt = Tr plog(,/po!/p). 


6.30 Let M be a measurement corresponding to the spectral decomposition of 
og pg'/?)1/2g-1/2_ Show that D© (pla) = DPF IEE) [14]. Show that 
D® (pllo) => —2 log Tr |./p/a| from Exercise 3.21 and (2.26). 


6.31 Show equation (6.66) following the steps below [17, 18]. 
(a) Show that ie Wem 4 dt 


1 
= | dog p,) 441] — fj dog o.)#e dt — [og pr dor]y + fp “S20: at. 

(b) Show that J;, = Tr e108 ft ape for the Bogoljubov metric. 

(c) Show that Tr 2108 fe Pty, =0. 

(d) Show (6.66) for the m geodesic p, = ta + (1 — t)p connecting two states p and 

om 


6.32 Show that the following three conditions are equivalent for two states p and 
a anda PVM M = {M;} of rank M; = 1 [17, 18]. The equivalence of © and © is 
nothing other than Theorem 3.6. 


® D(pllc) = D¥ (pli). 
@ Them geodesic py = 00 + (1 — O)p satisfies Jo, = if for #0 € [0, 1]. 


6.3 Geodesics and Divergences 273 


@® [o, p] = 0, and there exists a set of real numbers {ai}, satisfying 


d d 
p= (Sam) — 63 aa )a (6.70) 
i=l 


i=1 


< Joy. 


6.33 Show that lim,_,o0 De \|c®") = D(p||o) when Jo,x 


6.4 Quantum State Estimation 


In Chap. 2, we only considered the case of two hypotheses existing for the quantum 
states. In this section, we will consider the problem of efficiently estimating an 
unknown quantum state that is included in a state family {~9|9 € IR} by performing a 
measurement. The goal is to find 6. We assume that a system has been prepared with 
n identical states in a similar manner to Chap. 2. In this case, the estimator is denoted 
by the pair (M”, 6), where M” is a POVM representing the measurement on the 
quantum system H®” (with the probability space 92,, which is the set of possible 
outcomes) and 6, is the map from {2,, to the parameter space. In a similar way to the 
estimation of the probability distributions examined in Sect. 2.3, we assume the mean 
square error to be the measure of the error. If the parameter space is one-dimensional, 
an estimator with a smaller mean square error (MSE) 


Vo(M", bn) = Y° Gn (w) = 8)? Tr pf" M" (w) (6.71) 


results in a better estimation. We may then ask: what kind of estimator is then 
most appropriate for estimating the unknown state? One method is the following. 
First, we choose a POVM M, and perform the corresponding measurement n times. 
Then, the problem is reduced to the estimation in the probability distribution family 
{py |9 € R}. Second, we apply the maximum likelihood estimator 6, mv Of the 
probability distribution family to the obtained n outcomes. The mean square error 
is approximately Che )~! in the asymptotic limit according to the discussion in 
Sect. 2.3, where I is the Fisher information at 6 for the probability distribution 
{PM \0 © R}. 

According to this argument there exists some arbitrariness in the choice of mea- 
surement M. The essential point of quantum estimation is therefore to optimize this 
estimation procedure, including the choice of the POVM M. It is evident that certain 
conditions must be imposed on the estimators. For example, consider an estimator 
6 that always gives the value 0. If the true parameter is 0, the mean squared error is 
0. On the other hand, if the true parameter is not 0, the mean squared error becomes 
large. Such an estimator is clearly not useful; this indicates that the problem of the 
formulation of the optimization problem for our estimator must be considered more 
carefully. A simple example of such a condition is the unbiasedness condition 
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E9(M", bn) = > b,(w) Tr p?"M"(w) = 0, WOE O. (6.72) 


However, in general, the unbiasedness condition is too restrictive. In order to avoid 
this, we often consider the locally unbiased condition: 


dEg(M",6n) 


E9(M ’ i) = 6, do 


1 (6.73) 
at a fixed point 6 € ©. However, since this condition depends on the true parameter, 
it is not so natural. As an intermediate condition, we often treat the asymptotic case, 


i.e., the asymptotic behavior when the number n of prepared spaces goes to infinity. 
In this case, the asymptotic unbiasedness condition: 


es d a 
lim Eg(M",6,) =0, lim —-E,(M",6,)=1, VOeEO (6.74) 
n—0oo noo d@ 


is often imposed for a sequence of estimators {(M”, 6,)}°2,- The second condition 


guarantees a kind of uniformity of the convergence of Eg(M”", 6,) to 0. We are now 
ready for the following theorem. 


Theorem 6.6 (Helstrom [2], Nagaoka [13]) Ifa sequence of estimators {(M", 6,)} 
satisfies (6.74), then the following inequality holds: 


lim nVo(M", On) > Jt. (6.75) 


In the nonasymptotic case, when the locally unbiased condition (6.73) holds, inequal- 
ity (6.75) also holds without limit [2]. Its proof is similar to the Proof of Theorem 6.6. 

In the above discussion, we focus on the asymptotic unbiasedness condition. 
Using the van Trees inequality [19, 20], we can prove the same inequality almost 
everywhere without any assumption [21]. However, our method has the following 
advantage. Indeed, our method concerns one point. Hence, by choosing a coordinate 
suitable for one point, we can treat the general error function in the asymptotic 
setting. However, the van Trees method can be applied only to an error function with 
a quadratic form because the van Trees method concerns a Bayes prior distribution, 
i.e., all points. 


Proof Define O(M",6,) = >, 6,(w)M"(w). Since >, (4, (w) — 6) M"w) = 
O(M",6,) — 61, 
0 =>. (Gre) - % - (O(M", 6,) — 61) Mw) 


- (Gn(w) — 8) — (O(M", by) — 81) 
ay (4, w= 6) M"(w) (4, (w) — 0) — (O(M", 6,) — 61). (6.76) 


6.4 Quantum State Estimation 275 


The Schwarz inequality for the metric ( , 2 o".s yields 
Vom”, 6,) =Te >> (Bn) — 0) M"w) (8, w) — 8) 08 
‘ = Z @) \? 
>TH(O(M", bn) — O18" = (ou, 6,) or], ) 6.77 
Po 8 


A 2 
((Losn, O(M", bn) — 01), ) 
> es 


2; 
([Losnllen.) 


where Lg.s,, denotes the SLD e representation of the derivative of the state family 


@n dpy" dpe 
i = 4 Tr ag = 9, we have 


(6.78) 


{P5 


d : alt 
7p note, 6n)lo=0 = Tt OM", 6) —* |p=0, 


a 
|o=0) = (Ley,sn, O(M", 8 in) — 80) on. (6.79) 


ee 0,) — 09) <2 a 


from the definition of Eg(M”, 6,). Combining the above two formulas with Exercise 
6.19, we obtain 


(SEo(M", By Ino) 


nVo,(M",6,) >n 
nJo.s 


Taking the limit, we obtain (6.75) with 6 = 0. | 


According to the above proof, if equality (6.74) holds for a finite number n, then 
inequality (6.75) also holds for the same number n. Conversely, given a point 9 € R, 
we choose the function Bay.n and projections Eg, ,(w) such that the following is the 
spectral decomposition of the matrix “pa + 60: 


Loy,s.n 


Sy 2 Ey,n(w)969,n(w)- (6.80) 


Then, (E4,.n = {Ea.n@)}, Boo.) gives an estimator satisfying 


A 


, Eg, (Ea.n> CP) = 60. 


We may then show that 
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7 =(Lyy,s.n, O(M", 6 mn) — 8) fn 


mI 
6=0 


1! ©) 
=(Lop,5,n5 an. — La, ss) 9. = 1, (6.81) 


in a similar way to (6.79). This guarantees the existence of an estimator satisfying 
(6.75) under condition (6.74). However, it is crucial that the construction of (£4, n, 6,) 
depends on 49. We may expect from (6.81) that if @ is in the neighborhood of 6p, 
Va(En,, 8 6, ) would not be very different from Va, (Ey, a 6 .,). However, if @ is far 
away from 9p, it is impossible to estimate Vo (Eng, a 6,). The reason is that the SLD 
Loy.s.n depends on 6. If the SLD Lg; did not depend on 6, one would expect 
that an appropriate estimator could be constructed independently of 09. This is the 
subject of the following theorem. 


Theorem 6.7 (Nagaoka [13, 22]) Assume that a distribution p(@) satisfying 


i ppp(0) db > 0 (6.82) 


exists. Then, the following two conditions for the quantum state family pg and the 
estimator (M, @) are equivalent. 


0) The estimator (M, 6) satisfies the unbiasedness condition (6.72), and the MSE 
Vo(M, 9) satisfies 


Vo(M, 6) = Jy). (6.83) 


@ The state family is an SLD e geodesic pg = IT? po given by (6.44); further, the 
parameter to be estimated equals the expectation parameter 1 = Tr Lpg, and 
the estimator (M, 0) equals the spectral decomposition of L. 


See Exercises 6.36 and 6. for the proof of the above theorem. 

Therefore, the bound = in is attained in the nonasymptotic case only for the case 
(6.44), i.e., the SLD e geodesic curve. Another example is the case when a POVM 
M exists such that 


Jf4 = Jos for VO, (6.84) 


where J, a is the Fisher information of the probability distribution family ee |0 € R}. 
Then, if one performs the measurement M on n prepared systems and chooses the 
maximum likelihood estimator of the n outcomes of the probability distribution 
family {P Loe 19 © Ry}, the equality in ey (6.75) is ensured according to the 
aisciission in Sect. 2.3. Therefore, J, ' is also attainable asymptotically in this case. 
In general, a POVM M satisfying 6 84) rarely exists. Such a state family is called 
a quasiclassical, and such a POVM is called a quasiclassical POVM [23]. 
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Besides the above rare examples, the equality of (6.75) is satisfied in the limit 
n — co at all points, provided a sequence of estimators {(M”, 6,)} is constructed 
according to the following two-step estimation procedure [24, 25]. First, perform 
the measurement M satisfying if > 0 for the first ./n systems. Next, perform the 


measurement E But. anv (defined previously) for the remaining n — ,/n systems, 


based on the maximum likelihood estimator 4 m,n for the probability distribution 


los |6 € R}. Finally, choose the final estimate according to 6, ef 6, 


Out. Yan —Jn? as 


given in (6.80). If n is sufficiently large, Ogi: ya Will be in the neighborhood of 
the true parameter 6 with a high probability. Hence, the expectation of On — 6) is 
1 1 1 


approximately equal to Go Since lim n , we can 


n— /n) Jo,s R= 00 (n —_ Jn) Jo,s 7 Jo,s 
expect this estimator to satisfy the equality in (6.75). In fact, it is known that such 
an estimator does satisfy the equality in (6.75) [24, 25]. 

In summary, for the single-parameter case, it is the SLD Fisher metric and not the 
Bogoljubov Fisher metric that gives the tight bound in estimation theory. On the other 
hand, the Bogoljubov Fisher metric does play a role in large deviation evaluation, 
although it appears in a rather restricted way. 


Exercises 


6.34 Show that the measurement E49, defined in (6.80) satisfies nJg,, = a, 
6.35 Using the above result, show that an arbitrary inner product Ane? satisfies 


JAI < Al when (6.24) and |p! ||) = 1 hold. 


6.36 Prove Theorem 6.7 for pg > 0 following the steps below [13, 22]. 
(a) Assume that the estimator (M, 0) for the SLD e geodesic i p is given by the 


spectral decomposition of L. Show that the estimator (M, 6) satisfies the unbiased- 
ness condition with respect to the expectation parameter. 

(b) For an SLD e geodesic, show that a = Jos. 

(c) For an SLD e geodesic, show that the SLD Fisher information J,,, for an expec- 
tation parameter 7 is equal to the inverse Jy ; of the SLD Fisher information for the 
natural parameter 0. 

(d) Show that @ follows from @. 

(e) Show that 0(7) = ne Jiy,; dy! for an SLD e geodesic curve. 

(f) Show that p,(8(7)) = ie 1! Jy,5 dn! for an SLD e geodesic curve. 

(g) Show that if is true, 7-Ly,s = O(M, 6) — 0. 


(h) Show that if @ is true, then “ = “((O(M, 6) — n)p + po(O(M, 8) — 1)), 
where 77 is the parameter to be estimated. 

(i) Show that if m = 1 and py > O, the equality in (6.77) is satisfied only if the 
estimator (M, 6) is the spectral decomposition of O(M, 6). 

(j) Show that if @ holds, then @ holds. 


278 6 Quantum Information Geometry and Quantum Estimation 


6.37 Show that Theorem 6.7 holds even if pg > 0 is not true, following the steps 
below. The fact that @=> 0 still follows from above. 

(a) Show that (h) in Exercise 6.36 still holds for pg > 0. 

(b) Show that © yields @ by using the condition (6.82). 


6.38 Similarly to (6.77), show 


(e) ) 
per 


> Tr (6) - 0) M" wp" > (| O(M", by) 


6.5 Large Deviation Evaluation 


In Sect. 2.4.2, we discussed the large deviation type estimation of a probability 
distribution for the case of a single parameter. In this section, we will examine the 
theory for large deviation in the case of quantum state estimation. As defined in 
(2.173) and (2.174), GB({M”, 6,}) and a({M”, 6,}) are defined as follows: 


a ef |. 1 n 
BUM", On}, 0, 6) S lim —— log Tr p®"M"{I4,— 01>, (6.85) 
n 
a det, BUM", Bn}, 8, 
a((M", 6,}, 8) tim PUM Po) (6.86) 
«>0 E 


The notation M "16, — 6| = e} requires some explanation. For the general POVM 
M = {M(w)} and the set B, let us define MB according to 


MBS > Mw). (6.87) 


weB 


Then, we have the following theorem in analogy to Theorem 2.9. 


Theorem 6.8 (Nagaoka [12]) Let the sequence of estimators M = {(M", 6,)} Sat- 
isfy the weak consistency condition: 


Tr p®"M"{|, — 61>} > 0, Ve > 0,VOER. eee 


The following then holds: 


BUM", On}, 0,6) < pds, _, Pvor la). (6.89) 
n 1 
a({M", On}, 0) < 5 lob. (6.90) 


A different inequality that evaluates the performance of the estimator may be 
obtained by employing a slight reformulation. That is, a relation similar to (6.89) can 
be obtained as given by the following lemma. 
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Lemma 6.2 (Hayashi [26]) Define (3’(M, 0, 6) = lim. +0 G(M, 0, 6 — ©) for the 
sequence of estimators M = {(M", 0,,)}. The following inequality then holds: 


city, (9M. 8, 88) + 6M, 0 +5, ~ 8)8)) < —2log Tr | /Pav/Paval: 
(6.91) 


The essential part in the proof of this lemma is that the information — log Tr |,/p./o| 
satisfies the information-processing inequality™ °°. 
The relation corresponding to (6.90) is then given by the following theorem. 


Theorem 6.9 (Hayashi [26]) Let the sequence of estimators M = {(M", 6,)} satisfy 
the weak consistency condition and the uniform convergence on the RHS of (6.86) 
with respect to 0. Define a! (M, Oo) pa limy , 4, @(M, 9). The following inequality 
then holds: 


Jo, 
a’(M, 6) < 2. 


(6.92) 


Hence, the bound fs can be regarded as the bound under the following condition 


for a sequence of estimators: 


a(M, 6) = lim a(M, 4), (M, 0,5) = lim 3(M, 0,0 =«). (6.93) 
0 0 or 


So far, we have discussed the upper bound of a(M, @) in two ways. The upper 
bound given here can be attained in both ways, as we now describe. Let us first focus 


on the upper bound fis given by (6.92), whichis based on the SLD Fisher information. 


This upper bound can be attained by a sequence of estimators M = {(M”", 6,)} such 
that a’(M, 09) = a(M, 0) and the RHS of (6.86) converges uniformly concerning 
9. This kind of estimator can be constructed according to the two-step estimator given 
in the previous section [26]. Let us now examine the upper bound given by (6.90) 
using the Bogoljubov Fisher information, which equals fe in this case. This bound 
can be attained by a sequence of estimators satisfying the weak coincidence condition 
but not the uniform convergence on the RHS of (6.86). However, this estimator can 
attain the bound i only at a single point [26]. Although this method of construction 
is rather obtuse, the method is similar to the construction of the measurement that 
attains the bound D(p||c) given in Stein’s lemma for hypothesis testing. Of course, 
such an estimator is extremely unnatural and cannot be used in practice. Therefore, we 
see that the two bounds provide the respective answers for two completely separate 
problems. In a classical system, the bounds for these two problems are identical. This 
difference arises due to the quantum nature of the problem. 

The above discussion indicates that geometrical characterization does not connect 
to quantum state estimation. However, there are two different approaches from the 
geometrical viewpoint. For example, Hayashi [27] focused on the scalar curvature 
of the Riemannian connection and clarified the relation between the scalar curvature 
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and the second-order asymptotics of estimation error only for specific state families. 
These approaches treat the translation of the tangent bundle of state space. Matsumoto 
[28-30] focused on that of the line bundle and discovered the relation between the 
curvature and the bound of estimation error for the pure-state family. He pointed out 
that the difficulty rooted in two parameters is closely related to the curvature. 


Exercises 
6.39 Prove Theorem 6.8 referring to the Proof of Theorem 2.9. 


6.40 Prove Lemma 6.2 following the steps below. 
(a) Show that log Tr |/p®"V.o®"| = n log Tr |,/p,/o]. 
(b) Show that 


V Poy Pots 


P -)y)\2 - i 
<log|(T of," {16-0491 2 “—}) + (Ir p"m" {16-4 > 6)" 


m 


log Tr 


m L 


nangn\ip 6¢-)]\? 
+(e {}6-e1 = “<}) 


i=l 
: si)? 
(tivtm" ()6-e+91 > dae 


for an arbitrary integer m from the fact that the amount of information 
— log Tr |,/p./o| satisfies the information-processing inequality. 

(c) Choosing a sufficiently large integer N for a real number € > 0 and an integer 
m, we have 


1 i oi oi 
— log Tr p=" M" {14-4 = =| <= -3 (m0. ~) +E 
n m m 


for Vn > N,O < Vi < m. Show that 


nlog Tr |’pa-VDoss 
<logm+2)— 5 (nin Am. 9, 5f= ”) # Am. 045, “—) -2*) 
<i<m m 


m 


(d) Show the following for an arbitrary integer m: 


— log Tr |,/p0./P0+5! 
a min (s (mo aps ) +0 (mo on”) 2) 


~ 2 0<i<m m 
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1 
— —log(m+ 2). 
n 


(e) Prove (6.91) using (d). 


6.41 Prove Theorem 6.9 using Lemma 6.2. 


6.6 Multiparameter Estimation 


Let us now examine the case of a multidimensional parameter space © (dimension 
d). Assume that the unknown state lies in the multiparameter quantum state family 
{p9|9 € © C R*}. A typical estimation procedure is as follows. An appropriate 
POVM M is chosen in a manner similar to the previous one (excluding those for a 
Fisher matrix J ‘a with zero eigenvalues). Then, a measurement corresponding to M 
is performed on each of n quantum systems, whose states are unknown but identical 
to the state of another system. The final estimate is then given by the maximum 
likelihood estimator for the probability distribution family {P)"|0 ¢ © C R*}. 
According to Sect. 2.3, the mean square error matrix asymptotically approaches 
1(J “g )~! in this case. The maximum likelihood estimator then approaches the true 
parameter @ in probability. As mentioned in the previous section, our problem is 
the optimization of the quantum measurement M for our estimation. To this end, 
we need to find an estimator minimizing the mean square error Vv; of (M", 6, ) for the 
ith parameter 6! or the mean square error matrix Vo(M", Z 6, )= 1 J (M", 6, )] by 
taking into account the correlations between the 6’, where 


¥5/(M", 6,) = > G,w) — 6) Gi w) — 4) Tr p2"M"w). on 


The unbiasedness condition is then given by 


def 


E,(M", 6,) = >- 6) (w) Tr pp" M"(w) = 6', VOE O. 


In the asymptotic case, for a sequence of estimators {(M”, 6,)}, we can also write 
down the asymptotic unbiasedness condition 


. i np i . ¢) i n pv — si 
jim E,(M ,9,)= 9, dim, 997 b9(M 5O,)=6', VOEO. (6.95) 


Theorem 6.10 Let the sequence of estimators {(M", 6,)} satisfy the asymptotic 
unbiasedness condition (6.95) and have the limit Vie M", 6,}) = limy— 00 nV! 
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(M", 6). The following matrix inequality then holds [2, 7]: 


lim nV(M",6,) > (Jox) 1, x= 5,9. (6.96) 
n—oo 
Proof First, assume that any two complex vectors |b) = (b1,..., ba)’ € C4 and 
la) € C2 satisfy 
(b|Vo({M", 6,})|b) (al Fo,xla) = |(bla)/*. (6.97) 


Substituting a = (J'9,x)~!binto (6.97), we have (b|V6({M", 6n})|b)> (b|(Jo.x)~ 1b), 
since (J»,)~! is Hermitian. We therefore obtain (6.96). 


We next show (6.97). Define O, = >, (x6, Gi= 6)bi) M"(w) and Ly & 
ba, Lo, j,x,na;. Using (6.77) and Exercise 6.38, we can show that 


2 
7 ng _-t Qn agn 
(b|Vo((M", On})|b) = lim nD? Tr pp" M"(w) 


WwW 


@ \ 
> lim ‘ 
> lim n (IlOnl®,) 


YG w) = Obi 


in a manner similar to (6.76). Then 


? 
ax 


: =) i ng : (e) 
(bla) = Him, Db: 5gp BoM", On)aj = fim (On, Ln) jn 
iJ 
in a manner similar to (6.81). Using the Schwarz inequality, we can show that 


2 2 
(WOuIS.) (iznIS,) S(O. t)@ 


g 9% 


Inequality (6.97) can be obtained on taking the limit because (a|Jox.\a) = n 


2 
(IZall&,) - a 


pe. 

In general, there is no sequence of estimators that satisfies the equality in (6.96). 
Furthermore, as the matrix Ve! (M", 6,) is a real symmetric matrix and not a 
real number, there is no general minimum matrix ye (M", 6) among the esti- 
mators satisfying (6.95). Instead, one can adopt the sum of MSE, i.e., the trace of 
ve (M", 6) as our error criterion. It is therefore necessary to consider the minimum 
of tr limy_s 09 nVo(M *. 6,) in the asymptotic case. 

From (6.96) the lower bound of the minimum value of this quantity can be eval- 
uated as 


tr lim nV o(M", 6,) > min{tr V|V : real symmetric V > (Jan) '} (6.98) 
noo 
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because Vo (M", 6) isreal symmetric. If (J 6.x)! is real symmetric, the RHS is equal 
to tr(Jo.x)~ ‘te gx)! is a Hermitian matrix but contains imaginary elements, the 
RHS will be larger than tr(Jg_,)~!. In this case, we may calculate [7] 


min{tr V|V : real symmetricV > Jj\} =trRe(Jj,)+ t|Im(J5\)|. (6.99) 


For example, (Jg,;)~! and (Jg,,)~! are real symmetric matrices, as discussed in 
Exercise 6.22. However, since the RLD Fisher information matrix (J¢,-)~! pos- 
sesses imaginary components, the RHS of (6.99) in the RLD case will be larger than 
tr(J or)! . Moreover, in order to treat the set of the limits of MSE matrices, we often 
minimize tr G Vo ({M”, 6, }). From a discussion similar to (6.99), it can be shown that 
the minimum is greater than tr VG Re(J5))VG +tr IVG Im(J;,)VGI. Its equal- 


ity holds only when V4({M", 6,}) = Re(Jj.) + VG |VGImJz)VGIVG 
When the family in the two-dimensional space has the form {/¢|||@|| = r}, the set 
of MSE matrices is restricted by the RLD Fisher information matrix, as shown in 
Fig. 6.2. 

Xo XxX, X2 


In Fig. 6.2, we use the parameterization ( 
X20 X09 X1 


) and assume that Jp; 


is a constant time of the identity matrix. In addition, it was shown that these limits 
of MSE matrices can be attained [31]. The above figure also illustrates that the set 
of MSE matrices can be realized by the adaptive estimators. See Exercises 6.25 and 
6.50. 

The following theorem gives the asymptotic lower bound of tr Vo(M * 6,). 


Theorem 6.11 Let the sequence of estimators {(M", 6,)} satisfy the same conditions 
as Theorem 6.10. The following inequality then holds: 


tr lim nV,(M", 6,) > limn inf ey (6.100) 
noo m": POVM on He 


Conversely, we can construct the estimator attaining the bound min y tr(J oy )-! by 
using the adaptive method in a manner similar to the one-parameter case. Moreover, 
applying this method to the n-fold tensor product system H®", we can construct 


Fig. 6.2 Fisher information 
matrices 
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Fig. 6.3. MSE matrices 


an estimator attaining the bound n ming tr(J ‘a ae Hence, the set of realizable 
classical Fisher information J a and the set of ty ee characterize the bound of 
estimation performance. When the family in the two-dimensional space has the form 
{pe|||@|| = r}, they are as illustrated in Fig. 6.3. In Fig. 6.3, we assume that Jg,, is a 
constant time of the identity matrix. 


Proof Let us apply the same areument as in the Proof of Theorem 6.10 to the prob- 
ability distribution family {Pie ’|0}. Then 


(b|V9(M", 8,)|b) (al JM" | =|2.b5 SBM", 6,)'a; 


def 


a complex vectors |b) = (b,...,bg)’ € C4 and |a) € C%. Define (A, i = 
Bar -2_Ey(M", 6,)' and substitute a = aie )-!A,,b. Then (b|V9(M", 6,)|b) > (b| AF 
(iy )-1A, |b). Therefore, 


lim trnV9(M", 6,) > lim tr AnAtn(Ty 
n—->oo 


> Tim inf trA,Atn(s')"! 
mM": POVM on He 
=jimn inf ci 


mM": POVM on He 
which completes the proof. a 


More generally, under the same conditions as in the previous theorem, we have [7] 
tr lim nVg(M", bn) 
n—>>Oo 


nj trRe V4(X) + tr| Im V9(X)| 


j Opo .; 
6) =T. -X/ +, 6.101 
janx!) gop 
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where X is a matrix, ReX is the matrix consisting of the real part of each component 
of X, ImX is the matrix consisting of the imaginary part of each component of X, 


and V(X) © (Tr po X'X!) for a vector of matrices X = (X!,--- , X“). It is known 
that there exists a sequence of estimators satisfying the equality in (6.101) [32, 33]. 
In the proof of this argument, the quantum central limit theorem [34, 35] plays an 
essential role [32]. 

Such an argument can be given for infinite-dimensional systems. In particular, 
the quantum Gaussian state family is known as the quantum analog of the Gaussian 
distribution family and is a typical example in an infinite-dimensional system. In the 
classical case, the Gaussian distribution family has been extensively investigated. 
Similarly, the quantum Gaussian state family has been extensively investigated in 
the classical case [7, 16, 31, 32, 36-39]. 

Another related topic to state estimation is approximate state cloning. Of course, 
it is impossible to completely clone a given state. However, an approximate cloning 
is possible by first estimating the state to be cloned, then generating this estimated 
state twice. Although the initial state is changed in this case, it can be approxi- 
mately recovered from the knowledge obtained via the estimation. An approximate 
cloning is therefore possible via state estimation. In fact, it is more convenient to treat 
the cloning process directly without performing the estimation. Then, the optimum 
cloning method is strictly better than the method via estimation [40]. In particular, 
the analysis for approximate state cloning is simplified for spaces having a group 
symmetry, e.g., sets of pure states [41, 42]. An investigation has also been done in 
an attempt to find the interaction that realizes the optimal cloning [43]. The analysis 
is more difficult for problems with less symmetry [44]. 

The probabilistic framework of mathematical statistics has been applied to many 
fields where statistical methods are necessary. In many cases, this probabilistic frame- 
work is merely a convenience for the applied field. That is, the probabilistic descrip- 
tion is often used to supplement the lack of knowledge of the system of interest. In 
such a use of statistical methods, there is a possibility that statistical methods might be 
superseded by other methods due to further developments such as increasing com- 
puter speed and improvements in analysis. However, as discussed in Chap. 1, the 
probabilistic nature of quantum mechanics is intrinsic to the theory itself. Therefore, 
in fact, the framework of mathematical statistics can be naturally applied to quantum 
mechanics. Unfortunately, at present, it is not possible to operate a large number of 
quantum-mechanical particles as a collection of single quantum systems. Therefore, 
when we measure the order of 107° particles, we often obtain only the average of 
the measured ensemble as the final outcome. The quantum-mechanical correlations 
cannot be controlled in this situation. Furthermore, quantum-mechanical effects such 
as those given in this text cannot be realized. Additionally, when an observable X is 
measured on a system in the state p, the measurement outcome coincides with Tr pX 
with a probability nearly equal to 1. Therefore, statistical methods are clearly not 
necessary in this case. 

In proportion to experimental technology advances in microscopic systems, we 
can expect the growth in demand to individually operate a large number of quantum- 
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mechanical particles. The measurement outcome will behave probabilistically in this 
situation, and therefore mathematical statistical methods will become more neces- 
sary. In fact, in several experiments, statistical methods have already been used to 
determine the generated quantum state [45]. Therefore, the theory presented here 
should become more important with future experimental progress. 


Exercises 


6.42 Show the following facts when a separable POVM M” = {M"(w)}u.ca, in 
H®" is written as M"(w) = Mj (w) @---@ M"(w). 

(a) Show that a POVM M,.,, ; defined by (6.102) satisfies the conditions fora POVM 
and satisfies (6.103): 


Mo-n,i (w) 
=r (w)TrpeM} (w) - - -TrpeM;_,(w) -TrepM7, | (w) - --TrppMy (w), (6.102) 
d 
=e (6.103) 
i=1 
(b) Show that 
tr lim nV(M", 6,) > inf {tr(Jj")'| M POVM on H}. (6.104) 
n—->0Oo 


6.43 Show the following given a POVM M = {M,,} in H of rank M(w) = 1 [24]. 


M(w), M(w () 
(a) Show that >° — = dimH. 
> (Mw), T)65 
d J \(e) (e) 
M(w), L Lo,j.s,; Mw in j def 
(b) Shariati: Inia _ > ( ( ) Dalas' = ( Mo. ; where iy def 
" w j=l (Mw), 16. 
yy lie 
(c) Show that 
tr Jp Jy <dimH - 1. (6.105) 


When pg > 0, show that the equality holds if and only if every element M(w) can 
be written as a linear sum of I, Lois,.--, Lo.d,s- 

(d) Give the condition for the equality in (6.105) for cases other than pg > 0. 

(e) Show that inequality (6.105) also holds if the POVM M = {M.,} is not of 
rank M(w) = 1. 


6.44 When an estimator (VM, 6) for the state family {9|9 € R“} in H satisfies 


0 i np _ si i n pv — opi 
Bg7EWM.Bn)| | = 8). Big”. 8.) = Oy 
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it is called a locally unbiased estimator at 0). Show that 
inf { tr(Jj‘)-'|M POVM on H} 


— inf {tr V,(M, 6) (M, 6) : alocally unbiased estimator} (6.106) 


(d—-1) 


a 
6.45 Show the following equation and that J = Jo, gives the minimum 


fae | 
a 


Jy 5 


value [24]. 


1 
mi tr He 2 
a {Tr(J7') |Tr Jg.J =d-1}= (tr Jigs) 

J: symmetric matrix r Si 


6.46 Fix a normalized vector u on € R¢, i.e., assume that ||u = (u!,..., v7) || = 1. 
; ae def 
Let M“ be a measurement corresponding to the spectral decomposition of L(u) = 


ys u/ Lo, ;,s. Show that the Fisher information satisfies 


7 1 
J4 = Gal Tosluy 2081) uly os- (6.107) 


6.47 Let M and M' be POVMs {M,,} and {M/,,} with probability spaces 2 and 
QQ’, respectively. Let M” be a POVM that performs the measurements M, M’ with 
probability A, (1 — A). Show that Oe ig +(1- A) i =Jj . 


6.48 Consider the set of vectors u,,..., uz € R¢ with norm 1 in parameter space. 
Let M? be the POVM corresponding to the probabilistic mixture of spectral decom- 
position L(u') with probability p;. Show that the Fisher information matrix satisfies 


k 
M? a 
Jy? >> pi Gar] Foglmy Joss) wild. (6.108) 


i=1 


6.49 Using the result of the preceding exercise, show the following equation regard- 
less of the number of parameters when dim ‘H = 2 [24, 46-48]: 


1, 2 
inf { tr(J})~'| M POVM on H} = (tr J.) 


6.50 Using the result of the preceding exercise, show the following equation under 
the above assumption [24, 46-48]. 


_1\2 
int (te GIS" M POVM on ¥} = (tt (VG Sev’) 7 (6.109) 


6.51 Let {p9(i)|0 = (6',...,0¢) © R*} be a probability family and U;|(i = 
1,...,k) be unitary matrices in 71. Consider the TP-CP map kg : p t= a poli) 
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U;pU;*. Then, let J‘,,,x be the Fisher information matrix of the quantum state family 
{Ko(p)|0 € R¢ } for x = s,r,b, A, p, and J be the Fisher information matrix of the 
probability distribution family {p9(i)|0 = (0',..., 07) € R“}. Show that 


Jo = J 6,0,x- (6.110) 
6.52 Show that the equality in (6.110) holds if 
Tr U;pU; Uj pU; = 0 (6.111) 


holds fori ¢ j. In addition, let P; be the projection to the range of U; pU;*. Show that 
the output distribution of the PVM {P;} is equal to po(Z), i.e., the problem reduces 
to estimating this probability distribution. 


6.53 Consider the problem of estimating the probability distribution 9 with the 
generalized Pauli channel «,,, given in Example 5.8 as the estimation of the channel 
Kp) ® a. Show that (6.111) holds if p = |®z)(®q|, where |@z) = 4 = |ui) @|u;) 
and d = dim H4. 


6.54 As inthe preceding problem, show that no estimator can improve the estimation 
accuracy of the estimator with the input state |®,) (®,|®", even though any entangled 
state is input to a channel (« py @L a) defined with respect to the generalized Pauli 
channel Kk p,. 


6.55 Prove (6.99) following the steps below [7]. 
(a) Show that an arbitrary antisymmetric matrix Y may be rewritten as 


0) ay 
=—O1 0 
VYV' = 0 a 
—-azd 0 
for a suitable real orthogonal matrix V. 
(b) Show that 
max {Tr X|X —iY > 0} = Tr|iY| (6.112) 


x:real antisymmetric matrix 


for a real antisymmetric matrix Y. 
(c) Show (6.99) using (b). 


6.56 Define the Dg operator as 


Ey,sPo(X) = iLX, po]. (6.113) 
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Let Tp be the space spanned by {Lo15,..-, Lo,a,s}. Let To be the orbit of 7) with 
respect to the action of Dg. 

Show the following items. 
(a) Show the following: 


min {1 Re V(X) + tr| Im V(X)| 


ry OP8 
6) = xi 
Tr oi 


= min. | =Tr ee x: |. (6.114) 


X:X/ET, 


(b) Show the following: 


min. { Re V(X) + tr| Im Vo(X)| 


X:X/JET, 


Ope 
j 
§] = Tr EX | 


1 
=— min {1rRe Vo(X) + tr| Im Vo(X)| 


N X:XiETs 


§=Tr as xi |. (6.115) 


6.57 Show inequality (6.101) following the steps below [7, 16]. 

(a) Show that the RHS of (6.101) has the same value as the original even if we added 
the constraint condition Tr p»X' = 0 to the minimization. 

(b) Assume that an estimator (M, 6) and a vector X = (X’) of Hermitian matrices 
satisfy X' = >, 6! (w)M,,. Show that Vg(M, 6) > Va(X). 

(c) Show that tr V9(M, 6) > tr(Re Vg(X) + | Im V(X))). 

(d) Show (6.101). 


6.58 Show that the equality in (6.101) holds for a pure state following the steps 
below [49-52]. 
(a) Let pp = |u)(u| and let X = (X‘) be a vector of Hermitian matrices satisfying 


Tr ppX' = 0. Show that the vectors x’ “ X‘w are orthogonal to |u) and satisfy 
Vo(X) = ((x', x/)). 

(b) Choose u; such that Soe — = (\u;)(u| + |v) (u;|)/2 with the condition (u|u;) = 0. 
Define the matrix V(x) := ((x!, x/)). Show that 


Ap 
§ =T 
tape ‘| 
= _min _,, {tRe V(x) + tr] Im V(x)/|6/ = (x/|uj)} 


xi=(x!,.. 


min { Re V(X) + tr| Im V(X)| 


(c) Consider the case where all (x‘|x/) are oa Suppose that orthogonal vectors vz 
are real linear sums of the vectors u, x!,..., x4, and each (vi ue is nonzero. Then, we 


make the POVM {|v;)(v,|}. Show that the estimator (M, 6) 2 (flux) (vel} (vel") ) 


> (ug|u) 
satisfies E,(M, 0)u = x/. Also, show that Vo(M, 6) = V(x). 
(d) Let x!, ..., x? be a set of vectors such that (x! |x/) are not necessarily real. Show 
that there exists a set of vectors w!,..., w® in another d-dimensional space such 


that | Im V(x)| — Im V(x) = ((w'|w/)). 
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(e) Under the same assumption as (d), show that (y'|y/) are all real, where y' 
x' ® w! and @ denotes the direct sum product. 

(f) For a given set of vectors x',..., x“, show that there exists an estimator (M, 6) 
such that B4(M, 6)u = x/ and V4(M, 6) = Re(V(x)) + | Im(V(x))]. 

(g) Show that the equality in (6.101) holds for a pure state. 

(h) Examine the state family consisting of pure states with 2/ parameters, where 
1 is the dimension of the Hilbert space. Show that the RHS of (6.101) is equal to 


tr(Re J)~! + tr|(Re J)~! Im J (Re J)~||, where J = (J;,; © (u;|u;)). 


6.7 Relative Modular Operator and Quantum /-Relative 
Entropy 


6.7.1 Monotonicity Under Completely Positivity 


In this section, we introduce the relative modular operator and quantum /f-relative 
entropy, and investigate their properties. The content require only that of Sect. 6.1 
among this chapter, but has so different taste from the main topic of this chapter that 
the content cannot be put in the next of Sect. 6.1. Since the topic is not related to 
other sections in this chapter, and is related only to Sect. 5.4, we discuss this topic 
in the end of this chapter. 

For this purpose, we focus on two density matrices p = > , a;|u;)(u;| and 
oc = Dui b;\v;)(v;|. Given a matrix convex function f defined on [0, 00), we 


define the quantum f-relative entropy D ; (||) = Pa FG) Qe) (i, j), where 
Qo @, J) = Bj |(vj|ui)| [53] (See Sect. 3.2 for the detail of the notation Q(,),).). 
When the matrix convex function f is defined only on (0, 00), Le., it diverges at 0, 
the quantum f-relative entropy D;(p||o) can be defined only when P, > P,, P, is 
the image of p. 

To analyze quantum f-relative entropy, we define the two super operators L7! 
and R, as a linear map on the matrix space M,.;: 


L,(X) 2 oX, R(X) = Xp. 


Using these super operators, we define the relative modular operator A,,, =e bale 
By using the relative modular operator, we define another super operator f(A,,_). 
So, the quantum f-relative entropy D;(p||7) can be rewritten as 


Dy(pllo) = Tr f (Aya). (6.116) 


Similar to Sect. 2.1.2, when we choose the matrix convex function f(x) = x log x 
defined on [0, oo), the quantum f-relative entropy D+ (p||c) is the quantum relative 
entropy D(p||c). 

Then, we have the following monotonicity for quantum f-relative entropy. 
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Theorem 6.12 (Petz [53]) For a TP-CP map k, the monotonicity relation 


D(K(p)IK(O)) < Dy (plo) (6.117) 


holds for a matrix convex function f defined on [0,00) when P, < P,. When the 
matrix convex function f defined only on (0, oo), (6.117) holds under the assumption 
when P, = Pz. 


As another choice, the function f(x) = x° with a € [1,2] is a matrix con- 
vex function defined on [0, 00). Then, the quantum f-relative entropy D;(pl|c) 
is eC DPalele) — e?@-lell) The function f(x) = —x°% with a € [0, 1] is also a 
matrix convex function defined on [0, oo). The quantum f-relative entropy D¢(pl|o) 
is —e—DPalelle) — —e%-lell) Then, we obtain (5.52) and (5.53). The functions 
f(x) = x® with a € [-1,0) and f(x) = —logx are matrix convex functions 
defined only on (0, 00). 

To show Theorem 6.12, we focus on the inner product ( , ee and the space 


M ae (H), which can be identified with the quotient space M(H). The map A, 
is positive Hermitian under the inner product (Y, X heel = Tr Y*oX because 


(X, AgeX)S = Tr¥*e0 'Xp= Tr Y"P,Xp=0 
Then, the quantum f-relative entropy D (p||7) can be rewritten as 
Dy (plo) = (Po, Ap. Pa) or (6.118) 


when P, < Pz. 
Now, we prepare the following lemmas. 


Lemma 6.3 When k is the partial trace from Hap = Ha ® Hp to Ha, a matrix 
convex function f defined on [0, 00) satisfies that 


Tr X*K(0) f (Ancp),n(oy)(X) < Tr(X* @ Dao f(Apo)(X ®@ D (6.119) 


for X € M™ (Hap) when P, < Pz. 


Lemma 6.4 When k is the partial trace from Ha,p = Ha ® Hp to Ha, a matrix 
convex function f defined on [0, oo) satisfies that 


o f(Apo)(o') = f(ApoO") = Of (Apo)(Ps) = 0°" f(Apo)(P) (6.120) 


for s,t > 0 when P is a projection satisfying P > Pz. 


Proof of Theorem 6.12 Now, we show Theorem 6.12 by using Lemma 6.3. Choose 
the Stinespring representation of the TP-CP map « as K(p) = Trg U(p © po)U*. 
Substituting P,,(.) into X in Lemma 6.3, we have 
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Dy (K(p)||K(o)) = Tr f (An, no) (K(O)) 
2 Tr Pro) K(O) f (A(p), (0) ) (Paco) 
(b) 
< Tr( Pio) ® IU ® po)U* f (Au (papo)u*.U @po)U*) (Privo) ® T) 


Tr f (Auwapyurutoepyu+)(U (6 ® po)U*) = Tr f (Apap.cem)(7 ® po) 
=Tr(f(A,.z)(o)) ® po = Dy (pllo). (6.121) 


Here, (a) and (c) follow from Lemma 6.4, and (b) follows from Lemma 6.3. | 
Now, we show (6.17) by using Lemma 6.3. 
Proof of (6.17) Substituting p4-? (—x») into o (f (x)), we have 


— Tra X*(Trg p")'*X (Trg p?)* 
<—Tra,p(X ® Ip)*(p'?)'(X @ Ip) (94?) (6.122) 


because f Lone Roa)X = (p48)-X (4:8). Hence, we obtain (6.17). a 


Proof of Lemma 6.3 Although the map «*(X) = X @ J is defined as the dual of k 


with respect to the Hilbert Schmidt inner product, the dual K> , of K,, with respect 


é 


to the inner products ( , er and ( , ae is also K*. Note that «,,, is a map from 


Mr (Hap) to Me” (Ha). This is because 


K(a),r 


(1008 SO hn OP = Ta Yee) 


K(a),r 


= Tra Y*K(oX) = Tra Y* Trg(oX) = Tr(Y* @ I)(0X) = (Y @1, X). (6.123) 
for X,Ye M"") (Ha). Since P, < P,, we have 


Ko,r © Apo ° Kr (Y) = Kor O Apo (Y ® T) _ Kayo OY ® T)p) 
=K(0)'K(oo7'(Y @ 1)p) = K(a)'K(P,(Y ® Dp) 
=(Trg oc)! ¥ (Tre p) = Anima ¥ (6.124) 


for Ye M”) (Ha). 

Since jis oK* (Y) = kor (Y @ 1) = (Trg(o))' Trg o(Y @ 1) = (Tra(o))! 
(Trg0)Y =YforYe MM? (Ha), Ko,r © Kz, is the identity operator on the space 
M'") (Ha). 

Applying Condition @ in Theorem A.1 to operators A,,, and «% ,. on the space 
MM”) (Ha.p), we obtain 


Tr X*K(a) Ax(p),n() (X) = Tr X*K(0) f (Kor Ap.o Kg p(X) 
=f arpa OOM ip 21 oe Ai) oe OO) 3 
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=e CO, 7 (Ap) oe COS = Te OO ef Aye) on) 
= Tr(X* @ Dao f(Apo(X @ I). 


Proof of Lemma 6.4 We have . 
0° f(Apo)(o") = 0° f(L5') 0 f(Rp)(o") = 0° f(a "Jo" f(p) 
=fo "(oO =74p oe): 
Since 
Fie le foar" fe \PiMs=o" fe JR I@: 
we can show the remaining relations. | 


6.7.2 Monotonicity Under 2-Positivity 


Next, we relax the condition for the map « to 2-positivity. This relaxation for our 
analysis seems to have no physical meaning because it is too mathematical. However, 
this analysis is very useful for deriving the equality condition discussed in Theorem 
5.8 even under the completely positivity. For this purpose, a function f defined on 
(0, co) is called sub-linear when lim,_,.. f(x)/x = 0. 


Theorem 6.13 ([54, 55]) For any TP-2-positive map k, the monotonicity relation 


D(K(p)IK(O)) < Dy (pllo) (6.125) 


holds when one of the following conditions holds. 


@ f is a sub-linear matrix convex function defined on [0, 00), e.g., f(x) = —x° 
with a € [0, 1). 

@ f is a sub-linear matrix convex function defined on (QO, 00), (e.g., f(x) = x® 
with a € [—1,0), f(x) = —logx), and P, > Pz. 

@ f isamatrix convex function defined on [0, 00), (e.g., f(x) = x° witha € (1, 2], 
f(x) = xlogx), and P, < Pz. 

@ f is a matrix convex function defined on (0, 00), and P, = Pz. 


Indeed the advantage of Theorem 6.13 over Theorem 6.12 is not limited to the 
condition for the map «. Theorem 6.13 also relaxes the condition for the projections 
P, and P, when the matrix convex function f is sub-linear. The detail treatment of 
Theorem 6.13 enables such a subtle analysis. Further, we have the following equality 
condition. In the following discussion, the extremal decomposition given in Theorem 
A.2 plays an essential role. 


294 6 Quantum Information Geometry and Quantum Estimation 


Theorem 6.14 ((54, 55]) For a TP-2-positive map k, the following conditions are 
equivalent 


@® Equality in (6.125) holds for any f;(x) = a with an arbitrary t > 0. 

@ There exists a real number a € (0, 1) such that the equality in (6.125) holds for 
f(x) = -x% 

@ Equality in (6.125) holds when the matrix convex function f is a sub-linear 
matrix convex function defined on [0, 00). 


@ The relation P,K*(K(a)' Px) k(p)')) = 0 ' Pp! holds for any t > 0. 
When the relation P, < P, holds, Condition @ be simplified as follows. 
@’ The relation P,K*(K(a)~'K(p)')) = a‘ p! holds for any t > 0. 
Under this assumption, Conditions ©-@ are equivalent to the following conditions. 


@ Equality in (6.125) holds for f (x) = x log x. 

© There exists a real number a € (1, 2) such that the equality in (6.125) holds for 
f(x) =x", 

@ Equality in (6.125) holds when the matrix convex function f is a convex function 
defined on [0, ov). 

® PoK*((log k(p) — log K(o)) Prp)) = (log p — logo) Py. 

When the relation P, > P, holds, Conditions ©@-@ are equivalent to the following 
conditions. 


@ Equality in (6.125) holds for any f,(x) = i with an arbitrary t > 0. 

@ Equality in (6.125) holds for f(x) = —logx. 

© There exists a real number a € [—1, 0) such that the equality in (6.125) holds 
for f(x) = x®. 

® Equality in (6.125) holds when the matrix convex function f is a sub-linear 
matrix convex function defined on (0, 00). 

© The relation P,K*(K(a)' Pay k(p)')) = 0 ' Pp! holds for any real number t. 

@ P,K*(Pyo) log K(p) — log K(a))) = P, (log p — logo). 


When the relation P, = Pz holds, Conditions ®-®, @—® are equivalent to the 
following condition. 


@ Equality in (6.125) holds when the matrix convex function f is a convex function 
defined on (0, 00). 


As the special case of TP-CP maps, we obtain the following corollary. 


Corollary 6.1 ((54]) For any TP-CP map k, the following conditions are equivalent 
when P, < Pz. 


@® Conditions given in Theorem 6.14 with P, < P, hold. 
@ The relation o'/*K* (K(a)~/*K(p)K(a)7"/?)a'/? = p holds. 


When a is invertible, Condition @ of Corollary 6.1 is rewritten as Ke at (p) = p. 


That is, this can be interpreted via the conditional expectation with respect to the 
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inner product x = 5. However, when c is not invertible, we cannot define the dual 
* 
map kK 1 


Proof of Corollary 6.1 We show @=>@. Assume @ for a TP-CP map « from 714 
to Hg and states p and o on Hy. Then, we define the TP-CP map 7; from Hg 
to Ha as 1(p') = Pry p’ Pio) + (Ir p’'U — Pioy))K(o). We denote the image 
of Pic) by Hz. So, we define the TP-CP map 7) from H’, to H, as 72(p") := 
aK (K(a) 7/29" K(a)7"/?)a!/. Since TH) = K(a) PK 71a?) Ka) 1? = 
I, T2 is trace-preserving. So, we have 72 07 (K(p)) = p and 72 07)(K(o)) = a. Since 


D(pllo) = D(K(p)||K(o)) = D(z 0 T1(K(p))[l72 0 T1(K())) = D(pllo), 


we obtain ©. 

Next, we show ©=>©. Firstly, we show it only when x« is the partial trace from 
Hap = Ha ® He to Hy. In this case, the map « satisfies Condition @ of Theorem 
5.3. So, © with tf = 1/2 implies that 


P66 *(6(0) 7? Paco) (P) Paioy (0) 7) Py 
=P, 6 *(K(0)/? Pai) (p)'/7)) K*(6(p) /7) Papaya) !?) P, 
=o? Ppl? pl? Pag? — og ? PpP,a (6.126) 


Multiplying c!/? from both sides, we obtain ©. 


Next, we consider the general case. We choose the Stinespring representation of 
the TP-CP map « as K(p) = Trg U(p ® po) U%*. So, we have 


Py (oe pp)u* (KC) !? Peco) (0) Pacoy (0) '/”) ® Ip Puco@p)u 
=(U (6 ® po)U*)""”” Puepyu-U (p ® po) U* Puo@pyu=(U (o ® po)U*)""”. 
(6.127) 
Since Prep, = Ps ® Po), applying the unitary U and U*, we have 


(Po @ IT ® Py )U* (KO)? Paco) (P) Paco) /?) @ Ip) 
-U(I ® Py)(P, ® 1) 
= Prego) U* ((K(0)/ Pri) k(0) Pr(ay(O) 7) ® Ig )U Prep 
=(7 ® po)? Prey (P ® Po) Prep (F ® po) '/? 
=(07/? P,oP,o") ® Py. (6.128) 


Since K*(X) is given as Trg (J ® P,,)U*(X @ Ip)U UI ® P,,), taking the partial trace, 
we obtain 


Py kK (60)? Peco (0) Pray (7) Py = 0"? Pa pP,o/, (6.129) 


Since P, < P,, we have Pyp) < Px), which implies that 
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Por*(K(a) 7 «(p)K(0) 7) Py = 7 po"? (6.130) 


Multiplying o'/* from both sides, we obtain ©. | 


To prove the above arguments. we prepare the map & from the matrix space M jc), 
to the matrix space M,.,, as 


R(K(o)/?7X) = 0? K*(X), (6.131) 


for X € Mxio),r- That is, &(X) is defined to be o!/?«*(«K(o)~'/?X). From here, we 
(e) 


focus on the Hilbert Schmidt inner product ( , ) instead of ( , yo and (, )x(a),r- 


Then, we prepare two lemmas. 


Lemma 6.5 Given a TP-2-positive map k, we have the following items. 


(1) The monotonicity relation (6.125) with f = f, holds for t > 0. 

(2) When P, = P,, the monotonicity relation (6.125) with f = f, holds for t = 0. 

(3) When P, < P,, the equality Ds (p\|lo) = Df (K(p)||K(o)) holds with a function 
f =ax+b. 

(4) When P, < P,, the monotonicity relation (6.125) holds with a quadratic function 


f@wa= x, 


Lemma 6.6 For any TP-2-positive map k, we have the following matrix inequalities 
on the matrix space M,., with respect to the Hilbert Schmidt inner product ( , ); 


RR <IM,, (6.132) 
KB Ap ok < Ag(p),«(c) (6.133) 
(Anp).nioy +t)! < R*(Apo +t) 1k (6.134) 


fort > 0. 


Proof of Theorem 6.13 Now, using Lemma 6.5, we show Theorem 6.13. (A.47) 
of Theorem A.2 guarantees that any sub-linear matrix convex function defined on 
[0, oo) can be written as a positive sum of functions f, with t > O and a constant. 
So, (1) of Lemma 6.5 yields the desired argument under Condition ©. 

Also, (A.45) of Theorem A.2 guarantees that any sub-linear matrix convex func- 
tion defined on (0, oo) can be written as a positive sum of functions f; with t > 0 
and a constant. So, (2) of Lemma 6.5 yields the desired argument under Condition @. 

Similarly, the combination of (A.46) of Theorem A.2 and (1), (3), (4) of Lemma 
6.5 yields the desired argument under Condition ©. 

Finally, the combination of (A.44) of Theorem A.2 and (2), (3), (4) of Lemma 6.5 
yields the desired argument under Condition @. a 


Proof of Theorem 6.14 Step 1: Firstly, we discuss the case without the assump- 
tion P, < P, nor P, => P,. The decomposition (A.37) guarantees the equivalence 
between @ and @. Due to the same reason, the decomposition (A.47) in Theorem 
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A.2 guarantees the equivalence between @ and ©. Since @ with t € (0, 1) yields 
that 
Tr K(a)!—* (py! = Tr K(o Pz )K(o) ! Pray k(p)')) 
= Tr o Pyk*(4(o)* Petey (p)')) = Troo Po! = Trop’, (6.135) 
which implies @. Hence, it is sufficient to show that O>@. 
Assume that Condition ® holds. So, (6.149) implies that 


(K(0)"/?, (Anno) +t) 1(K(o)/)) 
=(k(a)"””, B* 0 (Ape +t)! 0 R(K(o))) (6.136) 


for t > 0. (6.134) and (6.136) imply that 


(Anno) +2) (K(a)'/?) = B* 0 (Apo +8)! 0 R(K(o)"”) 
=Kh* 0 (Apo +t) 10"). (6.137) 


Taking the derivative for ¢ in this equation, we have 
(Ancp)a(o) +t) 2(K(0)/?) = R* 0 (Ang +t) 2(0"?). (6.138) 
Thus, 


(R* 0 (Apo +t) (01/7), K* 0 (Apa +t) (0'?)) 


2 (Angntoey + 7 (6(0)"/), (Anip).ntoy + 11 (K(0)/)) 


=(K(0)"/2, (Anime) +) 2(K(0)"/2)) 2 (60) /2, BY 0 (Ape +1) 20") 


=(R(K(0)"), (Ape +t) 7a") = (o'?, (Ape +t) *(o')) 
=((Ape +t) '(6"”), (Ape +t) 1(e")), (6.139) 


where (a) and (b) follow from (6.137) and (6.138), respectively. Thus, the combi- 
nation of (6.132) and (6.139) implies that 


@ 


RoR 0 (Ape +t) (oN) = (Ape tt) 10) 3 a (Ayeg +t) ' (Po), 


(6.140) 
where (a) follows from Lemma 6.4. Since (6.137) and (6.131) imply that & 0 &* o 
(Apo F fy ol?) = ko (Ax(p),«(0) oF t)—!(«K(a) 1/7) Sg 6 ((Axcp),«(o) + 
t)~'(Pio))), we have 


ol g*((Ancp),n(o) +t) (Pacoy)) = O (Apo +1) | (Po). (6.141) 
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That is, 
Pt (Aigess ate) oF 2 (Pxoy)) = Py (Apo +c ty (PJ (6.142) 


Due to Stone-Weierstrass theorem, any continuous function can be approximated by 
asum of f;. So, any continuous function f defined on [0, 00) satisfies 


Po OF Agni) Paw) = Ps f (Apo) (Po). (6.143) 


Applying f(x) = x’, we obtain @. So, we obtain the required equivalence relations 
in this case. 

Step 2: Next, we discuss the case with the assumption P, < P,. We have 
already shown the equivalence from © to @. @ is trivially simplified to @’ due 
to the condition P, < P,. In this case, a linear function f satisfies the equality 
Df (pla) = D¢(K(p)||K(o)) due to (3) of Lemma 6.5. So, the decomposition (A.41) 
guarantees the equivalence between © and ©. Due to the same reason, the decom- 
position (A.43) guarantees the equivalence between ® and ©. When @’is assumed, 
the relation (6.135) with t = 2 shows the equality Ds (p|la) = Dy(K(p)||K(o)) for 
f(x) = x’. So, the decomposition (A.46) of Theorem A.2 guarantees @’+@=>@. 
Also, the relation @=> © is trivial. Taking the derivative in @’ at t = 0, we obtain 
@>®. 

Assume ®). Multiplying o and taking the trace, we have ©. So, we obtain the 
required equivalence relations in this case. 

Step 3: Next, we discuss the case with the assumption P, < P,. Notice that the 
equivalence from © to @ has been already shown. 

Assume ©. Due to the assumption P, < P,, we can apply (6.143) to any contin- 
uous function f defined on (0, 00). So, we choose f (x) = x‘ for any real number f. 
Hence, we obtain ©, which implies © > @. 

Assume ®. the relation (6.135) with t = —1 shows the equality D,,(pllo) = 
Dp (K(p)\|K(o)) for f(x) = x. Since @ > @ > O, we have @, which implies 
6 = @. Trivially, 0 => ©. 

The decomposition (A.42) guarantees the equivalence between ® and @. Also, 
the decomposition (A.41) guarantees the equivalence between ® and 9. Similarly, 
the decomposition (A.45) of Theorem A.2 guarantees O@ > O. 

Taking the derivative at t = 0 in ©, we obtain @ > @. Assume ®. Multiplying o 
and taking the trace, we obtain @. So, we obtain the required equivalence relations 
in this case. 

Step 4: Finally, we discuss the case with the assumption P, = P,. Notice that the 
equivalence from @ to @ and from @ to @ has been already shown. The decomposition 
(A.47) of Theorem A.2 guarantees @+@ with f(x) = x? => @. Since @ is stronger 
requirement than @, we obtain the required equivalence relations. | 


Proof of Lemma 6.6 For X € My c),r, we have 
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(R(K(o)"?X), R(K()'/?X)) = (0/7 K*(X), 0/7 K*(X)) 
=Tr K*(X)*o 76 *(X) = Tron*(X)K*(X)* 
<Trow* (XX*) = TrK(o)XX* = (K() R(X), K(o) 7 R(X)). (6.144) 
Since any element of M,,,,),- can be written with the form «(c) 2X, (6.144) implies 


(6.132). 
For X € M,c),r, we also have 


(K(0)"/?X, R* 0 Apo 0 R(K(a)'/?X)) = (R(K(0)'/?X), Apo 0 R(K(o)'/?X)) 
=(o? K(X), Apa(o?K*(X))) = ('?n*(X), oO ? Pe*(X)p) 
=TrK*(X)*o?0 7? Pkt (X)p = Tr K*(X)* Pw" (X)p 


= Tr K*(X)* Pak (X)p < Tr 6*(X)* K(X) p = Tr" (X*)K*(X)p, (6.145) 
(6(a)'7X, Anipn(o (Ka)? X)) = (K(a)'?X, Ko)? XK(p)) 
=Tr X*K(0)!? (a)? XK (p) = Tr X*XK(p) = Tr pk*(X*X). (6.146) 


So, the inequalities (5.3), (6.146), and (6.145) imply (6.133). 
(6.133) implies that K*A, oh +t > Axcp),x(o) +t. Since x H —x7! is matrix 
monotone, we have 


(Anip),n(o) +t)! < (R*Ap ok +t). (6.147) 
Since the function x +> x7! satisfies the condition of Corollary A.2, (6.132) implies 
that 

(Ay oR +t)! < R(Apo tt) lh. (6.148) 


Thus, (6.147) and (6.148) yields (6.134). = 


Proof of Lemma 6.5 Now, we show Lemma 6.5 by using Lemma 6.6. When f > 0, 
we have 


Dj (K(p) |) = Tr fi Ancp),ntoy) (K(0)) 
2 Tr K(a)? fr Any nia) (CO) /?) = (K(a)"?, (Anipynioy +) ((o) 9) 
CS (6(o)", RYO (Ape +1)! 0 A(K(o)")) 
=(A(K(0)"”), (Apo + 1)7! 0 R(K(0)"/”)) 
=(0'?, (Apo +t) 1(0")) 2 Dy (plo). (6.149) 
Here, (a) and (c) follow from Lemma 6.4, and (b) follows from (6.134) in Lemma 
6.6. Thus, we obtain the first argument. 


When P, > P,, we have Pry) > Pro). SO, the matrices o and K(c) belong to the 
spaces spanned by eigen spaces corresponding to non-zero eigenvalue of the super 
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operators A,,, and Ax p),«(c), respectively. So, we have the relation (6.149) with 
t = 0. Thus, we obtain the second argument. 

When P, < P, and f(x) = ax + b, we have the equality in (6.149). Choosing 
f(x) = 2, we have Dy(pllo) = Dy (ollp) < Dp (w(o)lk(p)) = Dp (K(o)II(p)). 
Hence, we obtain the third argument. | 


6.8 Historical Note 


6.8.1 Quantum State Estimation 


Research on quantum state estimation was initiated by Helstrom [2] in 1967. He 
derived the one-parameter Cramér—Rao inequality (6.75) for the nonasymptotic ver- 
sion. He also proved the multiparameter SLD Cramér—Rao inequality (6.96) for the 
nonasymptotic version [6]. Yuen and Lax [37] developed the RLD version of the 
Cramér—Rao inequality for estimation with a complex multiparameter. They applied 
it to the estimation of the complex amplitude of the Gaussian state. Belavkin [56] 
derived a necessary and sufficient condition for the achievement of this bound. Fur- 
ther, Holevo [7] derived the RLD Cramér—Rao inequality (6.96) with a real multipa- 
rameter and obtained the lower bound (6.101) with the locally unbiased condition in 
the nonasymptotic case [7]. 

Young introduced the concept of quasiclassical POVM concerning the state family 
[23]. Nagaoka [13] focused on (6.106) and derived the SLD one-parameter Cramér— 
Rao inequality (6.75) with an asymptotic framework. He derived its lower bound 
based on inequality (7.33) [57]. This bound is called the Nagaoka bound. Applying 
it to the quantum two-level system, he obtained (6.109) for the two-parameter case 
[58]. Hayashi [46, 47] applied the duality theorem in infinite-dimensional linear pro- 
gramming to quantum state estimation and obtained (6.109) in the three-parameter 
case as well as in the two-parameter case. After these developments, Gill and Massar 
[24] derived the same equation by a simpler method, which is explained in Exercise 
6.50. Fujiwara and Nagaoka [51] defined the coherent model as a special case of 
pure-state families and showed that bound (6.101) can be attained with the locally 
unbiased and nonasymptotic framework in this case. Following this result, Mat- 
sumoto [29, 52] extended it to the general pure-state case. Further, Hayashi and 
Matsumoto [31] showed that bound (6.101) can be attained with the asymptotic 
framework in the quantum two-level system using the Cramér—Rao approach. The 
achievability of bound (6.101) is discussed in Matsumoto [33] in a general framework 
using irreducible decompositions of group representation. It has also been examined 
in Hayashi [32] using the quantum central limit theorem. 

AS a nonasymptotic extension of the quantum Cramer—Rao inequality, Tsuda 
and Matsumoto [59] treated its nondifferentiable extension (Hammersley—Chapman— 
Robbins—Kshiragar bound). They also derived the lower bound of mean square errors 
of unbiased estimators based on higher-order derivatives (quantum Bhattacharyya 
bound). The quantum Bhattacharyya bound has also been obtained by Brody and 
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Hughston [60] in the pure-state case. Using this bound, Tsuda [61] derived an inter- 
esting bound for the estimation of polynomials of complex amplitude of quantum 
Gaussian states. Further, nonparametric estimation has been researched by D’ Ariano 
[62] and Artiles et al. [63]. 

The group covariant approach was initiated by Helstrom [64]. He treated the esti- 
mation problem of one-parameter covariant pure-state families. Holevo has estab- 
lished the general framework of this approach [65] and applied it to several problems. 
Ozawa [66] and Bogomolov [67] extended it to the case of the noncompact parame- 
ter space. Holevo applied it to the estimation of the shifted one-parameter pure-state 
family [68]. Holevo [7] and Massar and Popescu [69] treated the estimation of a 
pure qubit state with n-i.i.d. samples using the Fidelity risk function. Hayashi [70] 
extended it to an arbitrary dimensional case with the general invariant risk function. 
Bruf et al. [40] discussed its relation with approximate cloning. Further, Hayashi 
[71] applied this method to the estimation of the squeezed parameter with vacuum- 
squeezed-state families. Hayashi and Matsumoto [31] also treated the estimation of 
the full-parameter model in quantum two-level systems using this approach. Bagan 
et al. [72] treated the same problem by the covariant and Bayesian approach. 

Nagaoka [12] extended Bahadur’s large deviation approach to the quantum esti- 
mation and found that the estimation accuracy with condition (6.88) is bounded by 
the Bogoljubov Fisher information in this approach. Hayashi [26] introduced a more 
strict condition (6.93) and showed that the estimation accuracy with condition (6.93) 
is bounded by the SLD Fisher information. 


6.8.2. Quantum Channel Estimation 


Fujiwara [73] started to treat the estimation of a quantum channel within the frame- 
work of quantum state estimation. Sasaki et al. [74] discussed a similar estimation 
problem with the Bayesian approach in a nonasymptotic setting. Fischer et al. [75] 
focused on the use of the maximally entangled input state for the estimation of the 
Pauli channel. Fujiwara and Imai [76] showed that in the estimation of the Pauli 
channel Kg, the best estimation performance is obtained if and only if the input state 
is the n-fold tensor product of the maximally entangled state |®,)(®,|®". Exercise 
6.54 treats the same problem using a different approach. After this result, Fujiwara 
[77] and Tanaka [78] treated, independently, the estimation problem of the amplitude 
damping channel. Especially, Fujiwara [77] proceeded to the estimation problem of 
the generalized amplitude damping channel, which is the more general and diffi- 
cult part. De Martini et al. [79] implemented an experiment for the estimation of an 
unknown unitary. 

Concerning the estimation of unitary operations, BuZek et al. [80] focused on 
estimating an unknown one-parameter unitary action first time. They showed that 
the error goes to 0 with the order 4, where n is the number of applications of the 
unknown operation. Acin et al. [81] characterized the optimal input state for the 
SU(d) estimation where the input state is entangled with the reference system. 
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On the other hand, Fujiwara [82] treats this problem using the Cramér—Rao 
approach in the SU(2) case. This result was extended by Ballester [83]. Bagan et 
al. [84] treated the estimation of the unknown n-identical SU(2) operations using 
entanglement with the reference system. They also showed that the optimal error 
goes to O at a rate of Zs and effectively applied the Clebsch—Gordan coefficient 
method to this problem. Hayashi [85, 86] treated the same problem using a differ- 
ent method. He derived a relation between this problem and that of BuZek et al. 
[80] and applied the obtained relation to this problem. He also pointed out that the 
multiplicity of the same irreducible representations can be regarded as the reference 
system, i.e., the effect of “self-entanglement.” Indeed, independently of Hayashi, 
Chiribella et al. [87] and Bagan et al. [88] also pointed out this effect of the multi- 
plicity based on the idea of Chiribella et al. [89]. That is, these three groups proved 
that the error of the estimation of SU(2) goes to 0 at a rate of The role of this 
“self-entanglement” is widely discussed in Chiribella et al. [90]. Note that, as was 
mentioned by Hayashi [85], the Cramér—Rao approach does not necessarily provide 
the optimal coefficient in the estimation of unitary operations by the use of entan- 
glement. In particular, as was shown in [91], under the phase estimation with energy 
constraint, the Cramér—Rao approach does not work because the maximum Fisher 
information is infinity while the true minimum error can be characterized by using 
group covariant approach. Chiribella et al. [92] derived the optimal estimator in the 
Bayesian setup. Recently, Hayashi [93] discussed the Cramer-Rao approach more 
deeply. He showed the additivity of the maximum of the RLD Fisher information 
in the case of channel estimation. This fact shows that when the maximum of the 
RLD Fisher information exists, the maximum SLD Fisher information increase only 
linearly, i.e., the minimum error behaves as O(+). 


6.8.3 Geometry of Quantum States 


The study of monotone metric in quantum state family was initiated by Morozowa and 
Chentsov [94]. Following this research, Petz [3] showed that every monotone metric 
is constructed from the matrix monotone function or the matrix average. Nagaoka 
introduced an SLD one-parameter exponential family [13] and a Bogoljubov one- 
parameter exponential family [14], characterized them as (6.44) and (6.45), respec- 
tively, and calculated the corresponding divergences (6.57) and (6.58) [14]. He also 
calculated the Bogoljubov m divergence as (6.66) [18]. Other formulas (6.60), and 
(6.67) for divergences were first obtained by Hayashi [95]. Further, Matsumoto 
[96] obtained an interesting characterization of RLD (m)-divergence. Moreover, he 
showed that an efficient estimator exists only in the SLD one-parameter exponential 
family (Theorem 6.7) [13, 22]. However, before this study, Belavkin [56] introduced 
a complex-parameterized exponential family and showed that the RLD version of 
the Cramér—Rao bound with the complex multiparameter could be attained only in 
special cases. Theorem 6.7 coincides with its real-one-parameter case. Following 
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this result, Fujiwara [97] showed that any unitary SLD one-parameter exponential 
family is generated by an observable satisfying the canonical commutation relation. 
In addition, Amari and Nagaoka [8] introduced the torsion concerning the e par- 
allel translation as the limit of the RHS—LHS in (6.68) and showed that the torsion- 
free inner product is only a Bogoljubov metric. They proved that the torsions of 
e-connection vanish only for a Bogoljubov inner product. They also showed that 
the divergence can be defined by a convex function if and only if the torsions 
of e-connection and m-connection vanish. Combining these facts, we can derive 
Theorem 6.5. However, their proof is based on the calculation of Christoffel sym- 
bols. In this textbook, Theorem 6.5 is proved without any use of Christoffel symbols. 
Further, Nagaoka [11, 12] showed that the Bogoljubov metric is characterized 
by the limit of the quantum relative entropy as (6.34). Concerning the SLD inner 
product, Uhlmann [9] showed that the SLD metric is the limit of the Bures distance 
in the mixed-state case as (6.33). Matsumoto [10] extended it to the general case. 


6.8.4 Equality Condition for Monotonicity of Relative 
Entropy 


Although Petz [54] derived Corollary 6.1 in terms of operator algebra, he assumed 
that p and o are invertible. Also, he assume that the dual map «* is given as the 
inclusion of a subalgebra, which corresponds to the case when the original map & is 
the partial trace. For a general TP-CP map &, we have a Stinespring representation 
as K(p) = Trac U,(p ® po)U;. Then, the dual map «* is given as the combination 
of the inclusion of a subalgebra and the multiplication of an isometry. To reduce it to 
the case of the partial trace, we need to treat two states p ® po and o ® po, which are 
not invertible. So, Petz’s proof for Corollary 6.1 does not work even for invertible 
states p and a when the TP-CP map & is not a partial trace. To avoid to assume the 
invertible property for p and o, we introduce the matrix space M,,,(H) although 
Petz’s original proof employed only the full matrix space M(H). 

When p and o are invertible, Petz [54] also derived an equivalent Condition @ 
of Theorem 6.14 by replacing t by it (which is called the modified Condition ©). 
Since Petz [54] treated the infinite-dimensional case, o~' might be unbounded. To 
avoid the difficulty for unboundedness, he employed o~"’ instead of o~'. However, 
when p and o are not invertible, we need to treat 0-‘, which cannot be defined. So, 
in this book, we employ o~ instead of o~' with careful treatment of the projections 
P, and P,. 

His derivation © requires only the inequality (5.3), which can be derived from 
the trace-preserving property and 2-positivity for «. So, in another paper [55], he 
rewrote the derivation with the modified Condition © in terms of linear algebra 
by assuming this weaker condition when p and o are invertible. Theorems 6.13 
and 6.14 are extensions of this part in the following sense. Petz [54, 55] assumed 
that p and o are invertible and treated only the quantum relative entropy D(p|lc). 
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However, Theorems 6.13 and 6.14 can treat non-invertible and o and general 
quantum f-relative entropies, where the possible matrix convex function f depends 
on the relation of images of p and a. Then, Corollary 6.1 gives the same equivalence 
condition under the TP-CP condition for general quantum /f-relative entropies. 

In addition, Ohya and Petz [98] applied the result of Petz [54] to the case with 
measurement. Then, they characterized the existence of a measurement attaining 
equality in the monotonicity of the relative entropy (3.18) when p and a are invertible. 
Indeed, once we obtain Theorem 5.8, it is not difficult to derive Theorem 3.6, as shown 
in Sect. 5.4. However, it is not easy to prove this argument without use of Theorem 
5.8. Nagaoka [18] showed the same fact without assuming the invertible condition by 
using information geometrical method (Exercise 6.32 and Theorem 3.6). Fujiwara 
[17] improved these discussions further. 


6.9 Solutions of Exercises 


Exercise 6.1 Equation (6.1) and (6.4) yield (6.6). (6.4) and (6.5) yield (6.7). 


Exercise 6.2 Condition (6.1) implies that 


YX aire OS Weary x 


psx 


Sr eal Sauna: 
Also, Condition (6.2) yields that 
(X, X)O. = Tr X*E,.(X) = 0. 


Exercise 6.3 Assume @. Then, for two Hermitian matrices X and Y, E,, ,(X) and 
E,,,x(Y) are also Hermitian. Then, we have 


(X, YO. = Tr X*E,<(¥) 2S Tr Epe(X)*Y = Tr Y* Eps (XY, X)O, 


where (a) follows from Condition (6.1). Hence, we obtain ©. 
Assume @. Then, two Hermitian matrices X and Y satisfy that 


Tr YEp,x(X)* 2 Tr X*E,«(¥) = (X,Y), 


a xX STE g(x) = WY EX), 
where (a) follows from Condition (6.1). Hence, we obtain @. 
Assume @). We choose two matrices X and Y such that E,, ,(X) and E, ,(Y) are 


Hermitian. So, the two matrices X and Y also are Hermitian. Denoting E, ,(X) and 
E,,x(Y) by A and B, we have 


(A, B)® = (x, Y)@ = Trx*e,.) S rE, (XY 


=T YE = 7,478 = (8, Ay”, 
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where (a) follows from Condition (6.1). Hence, we obtain ©. 
Assume ©. We choose two matrices X and Y such that E,, ,(X) and E, ,(Y) are 
Hermitian. Denoting E, ,(X) and E,,(Y) by A and B, we have 
PY A =.) Sa A SA By Or 


=Tr X*E,,(¥) 2 Tr E,.(X)*Y = Tr YE, x(X), 


where (a) follows from Condition (6.1). Hence, we obtain @. 


Exercise 6.4 


(a) We consider only the case when x > y because the opposite case can be treated 
by swapping x and y. 
1 2 1 
Lm(,y) x+y x-y 
ae ae | 1 2 
-/ m4, ) my x+y 


x+y x+y 2 
log log y 
2 2 x+y 


(108. log 


_ 1 i: x+y 4 
xy Jo CPyr-P xty 


pe 4 (ep 
= Ha 1 pat 
xX-yJo xX+Y\ Gt 


— i v dt >0 
Paylin CPesr” 


The equality in the final inequality holds if and only if x = y. 


(b) We have E>} (A) = Yo yin EiAER because 


dt 


ij 2 “ 2 

= FP Ana 2 Ag 

2 Daeg OT Dae 
jk=l1 jk=1 
d d d 

k i] 
= HAR, E, AE; 

2 Apt re! 2 ptr? 


Xr Xr s 
k id 
= > + E, AE, = > E,AE, =A. 
; Ces —~) pee ieee 


Thus, 
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Al = Tr AE,\(A) =Tr A ee \EiAks 
jk= A 
3 z Tr AE;AE, 
— j ke 
jrei Aj + Ak 
Since 
1 1 e (log x—log y)A 1 
| ry = | yelosx—los Aq) — y | 
0 0 log x — log yJo 
~_ J] = 
=S7 : = = - = Lm(x, y), 
ogx—logy logx —logy 
we have 
1 d 1 
i I-A 
p GAO EIA zu dd 
i 2 LmQj;,\) 7 
1 4 1 
= ——____ 4) AE AE; dX 
i 2 Em(j, i)! 
d I d 
= 22 —___ Lm (Aj, x) Ej AE = Da E,AE, = 
me , Lim(A;, Ax k) frm 
Thus, E7,(A) = y it iG LI AEE: Hence, 
Z 1 
Al) =Tr AE }(A) = —_______ Tr AE AE. 
MA TARA 2 op MAAR 


(c) Since Ap — pA =" ,_, Ej AEx(x — Aj), we have 


Tr (Ap — pA) (Ap — pA)* = —Tr (Ap — pA) (Ap — pA) 


d d 
>) E)AEKOn — Aj) |] [| > Ey AEv Ov - Aj) 
jk=l j=! 
d d 
=—Tr >) EjAEAQg — Ajj — Ae) = Te DS Ae = Aj) EJAEKA. 
jk=l jk=l 


(d) The statements (a) and (b) guarantee that || A|| 4 > |All ve. The equality Toy = 
Tmo holds only when A; = ;. Therefore, © holds if and only if E; AE, A = 0 
or Aj = Ax holds for any k ¢ j. Due to (c), the latter condition is equivalent with ©. 
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Exercise 6.5 
(a) Ky (E,,5(X)) is commutative with Mj. Since ky (E,,5(X)) = Exyy(p),s(KM,p,s(X))s 
Fey (p),s (KM, p,s (X)) 18 commutative with M;. Since [ky (p), Mj] = 0, 


0= LE xar(p),s(KM,p,s(X)), Mi] 


1 
=l5 (Km (P)KM,p,s(X) + Ka p,s(X)6mu(p)), Mi] 


1 


=5 (Km (P)[Kat,p.s(X), Mi) + [Ko .p,s(X), Milk (p)) 


= Ex (p),s (KM, p,5(X), Mi)). (6.150) 


Since the map E;,,,(p),s 18 injective, [Kyy,,;(X), Mj] = 0. 
(b) Assume that every M@; commutes with X. Thus, 


1 1 
Enu(p)s(X) = (hm (p)X + Xm (0))) = Ka (5 (PX + Xp)) 
=KM (Eps(X)) = Exs(p),s (KM,p.s(X)). (6.151) 


Since the map E,,,,(p),s 18 injective, X = Ky,p,5(X). 
Conversely, when X = kp,s(X), we have ky (Ep,s(X)) = Exyy(p),s(X). Thus, 


0=([Mi, kM (E,,5(X))] =[Mi, Ey (p).s(X)] 
1 
=[Mi, 3 (kam (p)X + Xkm(p))] 
1 
= 5 (Km (0)LMi. X]+[M;, Xiu (P)) = Exucy.s (Mi, X))- (6.152) 


That is, [M;, X] = 0. 

(c) Since Ky,p,5 1s commutative with M;, the statement (b) implies that Ky,,,. 0 
KM,p,s(X) = KM.p,s(KM,p,s(X)) = Kops (X). 

(d) Assume that every matrix M; commutes with Y. 


(YA) = VE (Xa WY EX) 
= Tr ¥ Eretey,s(6m,p,s(X)) = (Y, tap,s(X))© (6.153) 


Km (p),8* 


(e) Similar to (6.150), we have 


LE xar(p),r(KM,p,s(X)), M;| = Evca(p),r (KM, p,s(X), Mj)). 


Hence, the statement (a) holds for the RLD. 

Similar to (6.150), we have Ex.y(p),r(X) = Esy(p),r(KM,p,r(X)). Hence, when 
[M;, X] = 0, X = km,p,,(X). When X = ky,,,-(X), similar to (6.152), we have 0 = 
[Mi, kM (E,,,(X))] = [Mi, Evy (pyr (X)] = Exy(p),r (Mi, X)]). Thatis, [Mi, X] = 0. 
Hence, the statement (b) holds for the RLD. 
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The statement (c) for the RLD follows from the sien (a) and (b) for the 
RLD. Similar to (6.153), we have (Y, = = (Y, km, Be.) es i.e., the statement 
(d) holds for the RLD. 


K(p).r? 
Exercise 6.6 We have 


(ea (ADI) 5)” 


1 
=Tr 3 (km (P)KM,p,s(X) = KM,p,s(X) km (P))KM,p,s (X) 


1 
=Tr 5 (PRM, p,s(X) ae KM, p,s (X)p)KM.p.s(X) = (Kat.p.s XO). 


As shown in the statement (c) of Exercise 6.5, K,,; is a projection. Since ||A Ea = 
|| X|\, © is equivalent with Karp,.(X) = X. Due to (b) of Exercise 6.5, the latter 
condition is equivalent with @. 

Exercise 6.7 The statement (d) of Exercise 6.4 and Theorem 6.1 show that 
JAN) > PAN? and [Al]? > Ilkar(A)I,,. 2 tespectively. Hence, || Al") > 


ps — 
(m) 
a (AD ll ne(p), s° 
Therefore, ® is equivalent with || A ||") = |||") and |] A[]") = lnm (A) yy «- 


Due to (d) of Exercise 6.4, the former is equivalent with [p, A] = 0. Due to Exercise 
6.6, the latter is equivalent with @ of Exercise 6.6. Hence, @ is equivalent with @. 


Exercise 6.8 Choose unitary U such that || XpY ||| = Tr UX pY. Thus, the Schwarz 
inequality implies that 


XY ll, = (UX, Y)O| < /Tr pY¥*/Tr p(UX)*UX 
=,/Tr pYY*/Tr pX*X 


Exercise 6.9 The Schwarz inequality implies that 


[WX li] =| Tr XU| =| Trepp 'XU| = |(p |X, U*)®| 
</Tr pp-!X (p-!X)*,/Tr pU*U = V/Tr p-! X X*/Tr pU*U 
= /Tr p-!X X*. 


Exercise 6.10 The Schwarz inequality for the inner product Tr X p!/Y* p'/? implies 
that 


[XI = [Te XU] = Trp? Xp") pu p\? 
<JTr p—!/2X p- 2. p!/2(p-"/2X p-"/2)* p'/2./Tr p!/2U* p!/2U 
=J/Tr p-!/2X p-"/2X*,/Tr p!/2U*p!/?2U < VTr p7!/2X p-!/2X*. 
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Exercise 6.11 


1 
(IX ® ell, = | (VX @ bely)2 pdr 
) 


=|Tr Xol? = (2, X)9,P < WU, PUXI@,)? = (XI). 


Due to the equality condition of the Schwartz inequality, the equality holds if and 
only if X is a constant times of /. 


Exercise 6.12 For X € {X|[X, M;] = 0 Vi} and a matrix Y, the statements (d) and 
(e) of Exercise 6.5 imply that 
(44 9,x(X), VIO. = (X, wate. = (X, YY. 


for x = s,r. The above relations guarantee that KM. p.x (Xx) = X. That is, hyp 
is is the dual map of the inclusion of the matrix subspace {X|[X, M;] = 0 Vi} for 
xXx=S5,r. 

Exercise 6.13 When X is Hermitian, E,.(X) = 5(Xp+ pX), E, 1(X) = /pX./p, 
and E,,,(X) = iis pX p'-*dd are Hermitian. 


Exercise 6.14 Since 0 = 4(dolda) = (bol Ge) + (Stldo), (Gol GF) is a pure 
imaginary number, which is denoted by ia. Thus, 


dee doo 


d 
79 100) (Pal = |G) (Gal + 1b0)( | 


=|) (bo| + ialde) (bol + 160) (bel — ialdo) (dol 
=o) (Go| + 16) (bal = Edy) (dot,s (2a) (Go| + 1b0) (Go1)). 


Hence, 


Jo,s = Tr bo) (bol + 166) (bo 12160) (bol + 10) (bol) = 4(dal go). 


However, there is no matrix X such that Ej4,)i,),.x(X) = |o)(¢ol + 160) (eal 
for x = r,b. Hence, both the RLD Fisher information and the Bogoljubov Fisher 
information diverge. 


Exercise 6.15 Apply Theorem 6.2 to the entanglement-breaking channel p t~> 
> (Tr Mj p)|ui)(ui| with the CONS {u;}. 


Exercise 6.16 Since £Tr Mipo = Tr M40 = (Mi, Los), is real number, 
(Mi, Los) = (Los; Mi Thus, 


Pos 


(45 Tt Mipoy? 


= Te iipg = 


(M;, Las) ,(Lo,s» Mi), 
(Mi, I) ges 
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Exercise 6.17 


(a) Exercise 6.12 shows that «7,),,(X) is the projection to the space spanned by 
{M;}; with respect to the umes product (, )‘°). Thus, Ki.p.s(X) = 0;(Mi, X)©) Mi. 


ps" 
(b) Since (M;, M; y@) Exercise 6.16 yields that 


pos —~ (M,, ao > 


2 
(Ilats09,s(Lo,s IOs). _ (13 Bs (Mi, Lo.s) S Mi 2.) 


= 2 (M;, Mj) (Mj, Los) (Los, Mj) 


ps 


(M;, Los) © (Los, Mi), - JM 
(Mi, iby ser 


= 


o- 


(c) The statement (b) guarantees that the equation Jp, = i” is equivalent with the 
equation || Ky, ),5(Lo,s) a = ||Los ee s: SINCE KV ,py,s 1S a projection (See Exercise 
6.12), the latter condition is equivalent with Ky_,,,,s(Lo,s) = Los. Due to (b) of 


Exercise 6.5, the final condition holds if and only if every M; commutes with Lg .. 
Exercise 6.18 
(a) Use the formula of the Beta function. 


(b) Use exp(X (0)) = ©, X()" ‘and (a). 


n! 


“0 


1 
[ ex pax (6) = exp((1 — A)X(0)) dX 
pgs ine (1 — Ay" X (8) 
-[{> . > dy 
n=0 m=0 
>> nim! X(@)" dX(6) X(6)" 
~ (n+m)! ni! dom! 


m! 


n=0 m=0 
wu! 4X0), 
gp PD rea dé ae) 
eile = ee m wa 1 dX()F _ dexp(X()) 
aa te eS age “Yi dd = = «dd 


dp2" 
a = y= 1 Po ® pp @ Ht 7 @ po ® po. Hence, we 


i-1 aes 


d @n : 
have 7a = E,g . Li=1 J @1@Lo,x Sle! . Since (J, Le), = 0 and 


i-1 n-i 


UI,1 ee = |, the relation (6.15) guarantees that 
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n 
Ton = [QL @L SL. @L@ Lion. | = (Lox, Lodo = Max: 
i=l j-] n-i 
Exercise 6.20 Since (Lj ; ix LO), iy y= (Lp eas Loix) y)*, Jo,x is Hermitian. 


; 5, b. Hence, 
F) : 

J6s:57 = ia Lopjal , = Tr aa L,,,j,x 18 a real number. Thus, Exercise 6.20 

guarantees that the Fisher information matrix J, is real symmetric. 


Exercise 6.21 Due to Exercise 6.13, L,,;,, is Hermitian for x = s 


wo\a 01 
Exercise 6.22 Consider the following example: pg = ( : ), t= ( ), 


0 a 
apy __ e zy 
a =\i 9 J 


Exercise 6.23 The relation (6.32) guarantees that Ly, = dos po 4 Since log pg = 


_i@y OY de7® Jos pei®¥ : 
e"™ log pe", Lo.» = —*“— = iflog p, Y]. 


Exercise 6.24 Consider the quantum state family {99 = e7!°" pe!®”} in Exercise 


6.23. The e and m representations of the derivative are i[log p, Y] and i[p, Y]. Then, 
we have i[p, Y] = E,,,(i[log p, Y)). 

Exercise 6.25 

(a) Simple calculations. 

(b) Use (6.20) and (5.26). 

(c) Use the relation given in (b). 

Exercise 6.26 Consider the TP-CP map fg := = Ap, @lldj+da- d)p5 ® |2) (2| > 
App + - d)p3- Then, we denote the Fisher information of the family {9} by Jo, Pr 
Theorem 6.2 implies that Jo, x ; Jo,x- 

Now, we choose Lg j,x as 4% — =F. ,(Lo.i,x). Then, Lo. x= ys 1 Lo,i,x @ li) | 
satisfies dh = Es, x (Lax). ine Jox = = Ib x +0 - Ade. which implies Jp. < 
Adget (i - os 

Next, we Sune that the space spanned by the supports of at and Pp are orthog- 
onal to those of Sf and pi Then, Expt oe fo den Leix) = Aa +(1=A)z, apy 
Thus, we have ie = Ade g +(1—-A) dps 


Exercise 6.27 First, notice that 
e251$,e35 = §; (6.154) 
fori = 2,3 and 


er (1 + x1S,)e2"! 
e(1+ x1) vere — *1) ie e(1+x1) ae a XD 5 
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Thus, 


3 
1 t . t 
sed + 2 x; S')je2™ 

e+e) +e "= *1) se etl +a1)—e "(1 — xi) 


5 5 S, + x2 + 353. 


t 


Since Tr 4e251\(7 + yas es = ae we have jis(t) = log 


3 
“dtmpte ‘U-*1) Therefore, we obtain the desired argument. 


Exercise 6.28 First, for a given SLD geodesic JT fo choose a unitary matrix U; such 
that U LU* is equal to the constant times of S;. Then, the SLD geodesic U IT} ,oU e 
has the form given in Exercise 6.27. Next, choose another unitary matrix U2 such 
that 


U2S,U; = $,, Uo(x2So. + x353)UZ =) a + ie Sy (6.155) 


Then, {U,U TT} ,oU*U3} is Sy when a = ad he + cee 
Exercise 6.29 


(a) It follows from i (x24). + tx)7'dt =x - log(1 + x). 
(b) f, Tro — p*(p + (0 — p))“"at 
= fo Tryp ‘op! -DU+t(yp oye! -D) "ep oyp | — Nat 
=Trp((/p o/p | — 1) — log + (/p c/o — 2) 
= —Trplog(/p 'o./p ') =Trplog(./po~'/p). 
Exercise 6.30 We denote the pinching corresponding to the spectral decomposi- 
tion of 07! (a! pa'/?)!/2g-1/? by &. Then, it is enough to show D®(plla) = 
DO (K(p)||[K(o)) and Di (K(p)||K(o)) = —2logTr|/p/o]|. Since K(L) = L, 
KUT? ,p) — TT pt,s*(P) and Jp; equals the Fisher information of IT) .K(P)s which 
is calculated to J“. Hence, we obtain D (p||o) = D (K(p)||K(0)). 

Since Exercise 3.21 shows Tr |./p./o| = Tr k(p)'/*«(o)'/”, the relation (2.26) 
yields that D© («(p)||K(o)) = —2log Tr | pol. 


Exercise 6.31 


(a) Use the partial integration formula twice. 
(c) Use (6.32). 
(d) Use (a), (b), (c) and the fact that “ = 0. 


Exercise 6.32 The equivalence of © and @ follows from (6.63). The equivalence of 
@ and @ follows from Exercise 6.7. 


Exercise 6.33 Apply (6.64) to the pinching ken. Then, combining Exercise 5.44, we 


have limy oo 4D” (p®"||o&") = lim, . 00 4 D(Kzen (p®")||o®”") = D(pllo). Due to 
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the relation Jo, < Jo,», (6.63) implies that D“” (p®"||0®") < D” (p®" ||o®") = 
nD(p\||c). Thus, we obtain the desired argument. 


Exercise 6.34 Similar to Exercise 6.30, since kz, ,(Lo,s,n) = Lo,s,n, we havenJp,.s = 


a" m 

Exercise 6.35 Consider a state family where A = See and pg, = p, and let & be 
: jaan (m) 

given by a POVM M. From property (6.24) and ||p~ "||"; = 1, we have if ie 
(ADIN. S AI. Using Exercise 6.34 with n = 1, we have In(A[2., = = 


|| Al”). Thus, we obtain the desired argument. 
Exercise 6.36 This exercise can be shown as the same way as Exercise 2.40. 
Exercise 6.37 


(a) Let K be the difference K between O(M, 6) —@and j Lo.s- Then, Kp+pK =0 
when Condition @ holds. 

(b) In the proof given in Exercise 6.36, the bottleneck is showing that the POVM 
M” is the spectral decomposition of O(M", 6,) from @ because this step uses the 
condition pg > O in Exercise 6.36. Hence, it is sufficient to show this step. Assume 
that © holds. Then, the equality in (6.77) holds. Thus, we obtain 


0=Trpp (x (4, w) — 4) M"w) (Bn) — 8) — (0—M", 8) - oy) 


Ww 


=a bs 6, (w)M" (w)8,(w) — O(M", i’) : 


Since 


O= i a0 Tr po. Bn (w)M" (w)8n(w) — O(M", i’) dd 


=Tr ( / pond) (x 6,(w)M" (w)6,(w) — O(M", ur), 


the condition (6.82) guarantees that >”, 6, (w)M” (w)b, (w) — O(M", 6,)2 = 0 
Hence, the POVM M” is the spectral decomposition of O(M", 6,). 


Exercise 6.38 Apply the same discussion as (6.77) to the state family { vas 


Exercise 6.39 Due to the relation lim,_,9 5 inf |g —¢\>¢ D(po || po) = 5 Jobs Theorem 
6.8 can be shown as the same way the Proof of Theorem 2.9. 


Exercise 6.40 


sapere < 6}. Bi def {o+ “=u 2<6<0+%),G=1,...,m), and 


def 


Bynti pa {04+ é6< 6}, and consider a POVM M; S M(B;) composed of m + 1 
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outcomes. Then, applying the monotonicity for fidelity (3.55), we obtain the desired 
inequality. 

(c) The desired inequality follows from the combination of the preceding inequalities. 
(d) This statement follows from (c). 

(e) In the inequality given in (d), take the limit — oo. Next, take the limitm — oo. 
Finally, taking the limit « + 0, we obtain (6.91). 


Exercise 6.41 Taking the limitd — 0, we have 


1 
5M, 0) = lim = | inf (0'(M, 0, 55) + 3'(M, 0+, (1—s)d)). 


(6.33) implies that 


1 


1 
qe =lim oS — 2log Tr |./p9./po+6l- 


Hence, combining (6.91), we obtain (6.92). 
Exercise 6.42 


(b) Since set inf { J}’| M POVM on H} is a convex set, trn(J")! 


= tro 12d io ' > inf {tr(J})~!| M POVM on H}. Combining (6.100), we 
obtain (6.104). 


Exercise 6.43 


(M(w),M(w))4, 


(a) First, notice that a 
(M(w),T) 5 


= Tr M(w). Then, taking the sum for w, we obtain 


the desired argument. 
(Mw), Lo,is)geLo,js Moy 


(Mw), 19 
(Mw), Lo,js)(Lo,j,5, Mw) 
(M(w), 1) 


. Thus, 


. M _ 
(b) We notice that Jij = ys 


wW 


Jose => Jans >. 


i,j Ww 
> (Mw), L) (Lo, js, Mw))9 
@ jal (Mw), NO 


Mw), 1), Mw) 
(c) We have | = = "0, a{ @ us . Hence, as shown in Exercise A.1, 
a (M (w), T)6 5 
SS Lois. (Lo, js) + IN (7| can be regarded as the projection to the subspace 
spanned by J, 1 {spe elutes Thus, 


a (Mw), Li oy (L0,js- Mego > (M(w), DOT, Mwy) 
7 (Mw), 1)? ri (Mw), 1)4? 


M(w), Mwy) 


£7 eae =dimH, 


a (Mw), DN? 
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which implies (6.105). Due to the above discussion, the equality holds only when 
every element M(w) belongs to the subspace spanned by /, Lois, ..., Lo.d,s- 

(d) The equality in (6.105) holds only when every element M (w) satisfying Tr M (w) 
po = (Mw), bisa > 0 belongs to the subspace spanned by /, Lo15,..., Lo.d.s- 
(e) Consider a POVM M' = {M;} of rank Mi’ = 1 and a stochastic transition matrix 
Q = (Q')) such that M,, = >, Q',M!. Due to Exercise 2.42, we have JM > J™. 
Hence, we obtain (6.105). 


Exercise 6.44 If (M, 6) is alocally unbiased estimator, we can show that Vg(M, 6) > 
-* in the same way as (2.139). 

Then, we can show that for each POVM M there exists a function @ such that 
(M, @) is a locally unbiased estimator and V9(M, 6) = Cie me 


Exercise 6.45 Use the method of Lagrange multipliers. 


Exercise 6.46 First, notice that (|| L(u) || )’ = (u|Jo,s|u). Then, we can show that 


Pos 


(x| JP |x) = Tr LO) eae (L(x) = (LO) Ka py,s (L(x) 


Pos 
[L(u)) ,(L(u)| 
=(L SM". on.s L @) + L Pa,8 L (e) 
(LOM Me pu sl LOO ing = (LOVE a (LON 
for x € R*@. Hence, we obtain (6.107). 
dp, 
Exercise 6.47 Since J“ = >) <9 eae we have 


oe QT Mo) x =A) Tr eM) 
XTi pM, (1 — A) Tr pp M!, 


w’ EQ’ 
(Tr 4 da M,,)? (Tr “+ m',)? - ae 
=) d = XJ, Liens, 
a Ti pM, wh ss) Dz Th pM’, 2 me ) e 
weEQ w'EQ’ 


Exercise 6.48 (6.108) follows from (6.107) and Exercise 6.47. 
Exercise 6.49 The inequality > follows from Exercise 6.45 and (6.105). Hence it is 
sufficient to show the existence of aPOVM M such that tr( tig 9) a = (w aie 65 ‘) . Let 


uj,...,Uq be the eigenvectors of Jg,;, and let p; be the eigenvalues of = Jno ae 


Jy? 
1 


-i Jos: 


tr Joe 


Then, the RHS of (6.108) is equal to 


Exercise 6.50 Choose the new coordinate 6’ such that the SLD Fisher information 
matrixis /G J 6sVG . Hence, we have 


re 
inf { tr(J}/)~'| M POVM on H} = (« (ve JosVG- ‘) oh 
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Since VG sit G = Ji, we have tr G(JM")-! = tr(JM")~!. Thus, we obtain 
(6.109) 


Exercise 6.51 Apply Theorem 6.2 to the TP-CP map pg +> Ko(p). 


Exercise 6.52 In this case, the outcome i’ by the PVM { P;} coincides with the initial 7. 
Hence, the output distribution of the PVM { P;} is equal to pg (i). Therefore, applying 
Theorem 6.2 to the measurement by the PVM {P;}, we obtain the inequality opposite 
to (6.110), which implies the equality of (6.110). 


Exercise 6.53 It follows from the fact that (®,|X'Z/|®,) = 0 unless i and j are 0. 


Exercise 6.54 As shown in Exercise 6.51, the Fisher information is bounded by that 
of the distribution family {pg}. This bound can be attained if and only if the input is 
the maximally entangled state because the condition holds only in this case. 


Exercise 6.55 


(b) First, we consider the case when Y is given as the RHS of the equation in (a). If 
X satisfies X > iY, the diagonal elements of X is greater than the following: 


This fact can be shown by applying the projection to the 2-dimensional space spanned 
by the 2ith and 27 — Ith components. 

Hence, the LHS of (6.112) is greater than Tr |iY|. When X = |iY|, the condition 
X > iY holds. So, we obtain (6.112). The general case can be reduced to this special 
case by applying the orthogonal matrix V given in (a). 


(c) 
min{tr V|V : real symmetricV > Joi 
=tr Re(J5 1) + min{tr X|X : real symmetricX > iIm(J5\)} 
=trRe(Jj\)+ tr|Im(J5.)I- 
Exercise 6.56 
(a) Let P be the projection to Ty. Then, define P (X) := (PX!,..., PX%). When 


6) = Tré See Xi, we have Tr See pXi = (Lois, PX)O, = (PLois, X!), = 
(Lois, X/ Ve, = /. Since (PX, I— P)X)) , = 0, we have Vg(X) = Vo(P(X))+ 
Vo(U — P)(X)), which implies tr Re V(X) +tr| Im Vo(X)| > trRe Vg(P(X)) + 


tr | Im V(P(X))|. Then, we can show the desired argument. 
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(b) For any matrix X on H, we define X") := 1°", 1 @1@X @1@1. The 
—— —— 
i-l n—-i 
space spanned by the orbits of SLD of the state pa is given as {X“|X € Ty}. When 


X satisfies Tr 2x/ = 6/, we have Tr xi = 6/. Then, trRe Vo((Xi)) + 
tr| Im Vo((xi))| -_ Ltr Re V,(X) +tr | Im V,(X)|). Thus, we obtain the desired 
argument. 


Exercise 6.57 


(a) Define the vector Ey(X) := ((Tr X‘ po) J) and the matrix vg(X) := (Tr X! pg Tr X/ 
po)i,;- Then, we have 


V(X) = Vo(X — Eo(X)) + v9(X), (6.156) 


which implies tr Re Vg(X)-+tr | Im V9(X)| = tr Re Vp(X— Ep (X))+tr | Im Vo (X— 

Eo(X))| + tr ve(X). When the matrix-valued vector X satisfies the condition oH = 

Tr 56 x J, the matrix-valued vector X’ = X — E4(X) also satisfies it. The matrix- 

valued vector X’ satisfies the additional condition Tr pg X n= 9g So, the minimum 

value is realized when the additional condition Tr pg X' = 0 holds. 

(b) Apply (6.76) to the case when O(M", 6,) is replaced by >); a; X', where (a;) is an 
> (alVo(X)la). 

Since (a;) is an arbitrary complex-valued vector, we have Vg(M, @) > Vo(X). 

(c) Since Vo(M, 6) > V(X), we can show that tr Vg(M, 6) > tr(Re Vo(X) + 

| Im V»(X)|) in the same way as Exercise 6.55. 

(d) Combining (6.100) and (6.106), we have 


arbitrary complex-valued vector. Hence, we obtain (a|V9(M, 6) la) > 


tr lim nV9(M", 6,) 
noo 
>Timn inf {tr V,(M", 6 (M, 6) ; a locally unbiased estimator] _ (6.157) 
When (M, 6) is a locally unbiased estimator, the matrix-valued vector X’ := (O(M", 


6')) satisfies the condition 5 = Tr Sp x Due to (c), we have tr Vo(M, 0) > 
tr(Re V(X’) + | Im V9(X’)|). Thus, we obtain 


inf {tr Vi(M ies 6) (M, 6) : alocally unbiased estimator] 


. Ap?" _. 
= min { rRe Vo(X) + tr| Im V9(X)| |5/ = Tr wee ; (6.158) 


Oo! 


Combining (6.114), (6.115), (6.157), and (6.158), we obtain (6.101) 
Exercise 6.58 


(d) Due to (b) of Exercise 6.55, we can see that | Im V(x)| — Im V(x) > 0. 
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(e) Vi_y) = Vix) + Vw) = Re V(x) + Im V(x) + | Im V(x)| — Im V(x) = 
Re V(x) + |Im V(x)|. 

(f) Let (M = {M;}, 6) be the estimator given in (c) for the set of vectors (y') in 
the extended system given. Let P be the projection to the original space. Define 
another POVM M’ = {PM,P}. Then, Ei(M',0)u = PE/(M,6)u = x/. Also, 
Vo(M', 0) = Vo(M, 0) = V(y) = Re V(x) + |Im V(x)]. 

(f) For any x satisfying 67 = (x'|u j)» the above argument shows the existence of 
locally unbiased estimator M such that V(M) = Re V(x) + |Im V(x)|. 

(h) A vector (x') satisfying (x'|uj;) = 5 is limited to the vector x/ = 
((Re J)~')'/u;. In this case, we have Re V(x) + |ImV(x)| = tr(ReJ)~! + 
tr|(Re J)! Im J (Re J)“!}. 
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Chapter 7 
Quantum Measurements and State 
Reduction 


Abstract In quantum mechanics, the state reduction due to a measurement is called 
the collapse of a wavefunction. Its study is often perceived as a somewhat mystical 
phenomenon because of the lack of proper understanding. As a result, the formalism 
for the state reduction is often somewhat inadequately presented. However, as will be 
explained in Sect. 7.1, the state reduction due to a measurement follows automatically 
from the formulation of quantum mechanics, as described in Sect. 1.2. Starting with 
the formulation of quantum mechanics given in Sects. 1.2 and 1.4, we give a detailed 
formulation of the state reduction due to a measurement. In Sect. 7.2, we discuss the 
relation with the uncertainty relation using these concepts. Finally, in Sect. 7.4, we 
propose a measurement with negligible state reduction. 


7.1 State Reduction Due to Quantum Measurement 


In previous chapters, we examined several issues related to quantum measurement; 
these issues were concerned only with the probability distribution of the measurement 
outcomes. However, when we examine an application of a measurement after another 
application of a measurement to the same quantum system, we need to describe the 
state reduction due to the first measurement. First, we discuss the state reduction 
due to a typical measurement corresponding toa POVM M = {M.,}. Then, we give 
the general conditions for state reduction from the axiomatic framework given in 
Sect. 1.2. 

Assume that we perform a measurement corresponding to the POVM M = {M.,} 
and obtain a measurement outcome w. When the state reduction has the typical form 
due to the POVM M = {M.,}, the resultant state is 


1 
JM. pV... (7.1) 


Tr pM., 


where eae is the normalization factor.! In particular, if M is a PVM, then M,, is a 
projection and therefore the above state is [1] 


‘Normalization here implies the division of the matrix by its trace such that its trace is equal to 1. 
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1 
Tr pM, 


M.pM.,,. (7.2) 


Since (7.2) is sandwiched by projection operators, the above-mentioned state reduc- 
tion is called the projection hypothesis. In many books on quantum mechanics, the 
state reduction due to a measurement is restricted only to that satisfying the projec- 
tion hypothesis. However, this is in fact incorrect, and such a state reduction is merely 
typical. That is, it is not necessarily true that any state reduction corresponding to 
the POVM M « satisfies the above (7.2). In fact, other types of state reductions can 
occur due to a single POVM M, as will be described later. 

Now, we assume that we are given an initial state p on a composite system H,4 ® 
Hp. When we perform a measurement corresponding to the POVM M® = {M?}., 
on the system 7/1, and obtain the measurement outcome w, the resultant state of H4 
is then 


—_______ _ Tr(I M2 )o(1 MB), 7.3 
Tr pa ® MB) Ira @ pla @ 2) (7.3) 


regardless of the type of state reduction on 7/z, as long as the measurement outcome 
w obeys the distribution Trg (Tr, p)M2. To prove this fact, we consider an arbitrary 
POVM M4 = {M4},.<7 on Ha. Since (M4 @ M8) = (4, /M2)(M4 @ Ig), 
JMB ), the joint distribution of (x, w) is given by 


Tr p(Mf @ M2) = Tr(Is @ \/ MB)pU4 @ \/ MB)(ME @ Ip) 


=Tr[Tr(ls ® y MPPs ®  ME)IME, 


according to the discussion of Sect. 1.4, e.g., (1.26). When the measurement outcome 
w is observed, the probability distribution of the other outcome x is 


meu THT) @ \/ME)pa @ J M2)IMS, 

which is the conditional distribution of x when the the measurement outcome on 714 
is w. Since this condition holds for an arbitrary POVM M A {M he xv on Ha, 
(7.3) gives the resultant state of 714 when w is the outcome of the measurement M 2 
on 7p. 

However, since (7.3) only describes the state reduction of a system that is not 
directly measured, the above discussion does not directly deal with the state reduction 
on the system 1/3, e.g., (7.2). As shown from the Naimark—Ozawa extension [2, 3] 
given below, it is theoretically possible to perform a measurement such that the state 
reduction follows (7.1) or (7.2). 


Theorem 7.1 (Naimark [2], Ozawa [3]) Consider an arbitrary POVM M = 
{M.}ceq on H and arbitrary outcome wo € 92. Let Ho be the additional space 
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with the orthonormal basis {u.5}.,, and let us define the PVM E., = |u,,) (u| on Ho. 
There exists a unitary matrix U such that 


Tr Mp = Tr ® E.,)U(p @ po)U", (7.4) 
Vv Mupy Ma = Tr ® E.)U(p ® po)U*(I ® E..), (7.5) 


where po = |Uwy) Uwol- 


Theorem 4.5 can be obtained from this theorem by considering a PVM {U*(J @ 
E.)U}.,. 

In the following, we make several observations based on this theorem. As 
described by (7.4), a measurement corresponding to an arbitrary POVM M can 
be realized by a PVM E = {E,,} with an appropriate time evolution U between 
H and Ho. Furthermore, according to the above arguments, when a measurement 
corresponding to the PVM E on the system 7{y is performed, the resultant state of 
H with the measurement outcome w is given by [3] 


1 
Tr U(p ® po) U* (I @ E,,) ar ® E.)U(p ® po)U"U ® Eu) 


Theorem 7.1 therefore shows that the above procedure produces a measurement 
corresponding to the resultant state (7.1). This model of measurement is called an 
indirect measurement model. The additional space 7/9 is called an ancilla, which 
interacts directly with the macroscopic system. In this way, this model describes 
the resultant state of system 1 of our interest when the measurement outcome w 
is obtained in the ancilla. However, it does not reveal anything about the process 
whereby the measurement outcome is obtained in the ancilla. Hence, there remains 
an undiscussed part in the process whereby the measurement outcome is obtained 
via an ancilla. This is called the measurement problem. 

In almost all real experiments, the target system 7 is not directly but indirectly 
measured via the measurement on an ancilla. For example, consider the Stern— 
Gerlach experiment, which involves the measurement of the spin of silver atoms. 
In this case, the spin is measured indirectly by measuring the momentum of the 
atom after the interaction between the spin system and the angular momentum sys- 
tem of the atom. Therefore, in such experiments, it is natural to apply the indirect 
measurement model to the measurement process. 

The above theorem can be regarded as the refinement of the Naimark extension for 
the measurement of real quantum systems [3]. As this construction was firstly given 
by Ozawa, the triple (Ho, po, U) givenin Theorem 7.1 is called the Naimark—Ozawa 
extension. 


Proof of Theorem 7.1 For simplicity, we consider the case when the probability space 
2 has a finite cardinality. Without loss of generality, we can assume that the prob- 
ability space £2 is {1,...,}, the orthonormal basis of 7{o is given by {u;}/_,, and 
po is given by |uw,)(u;|, for simplicity. First, let us check that an arbitrary matrix 
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U on H ® Ho can be written as (U'); ;, using a matrix U'/ on H. This implies 
that (J ® |u;)(uj|)UU ® |uj)(uj|) = U's ® |u;)(u;|. Accordingly, (7.5) is equiv- 
alent to ./M;p./M; = U*'p(U"')* with w = i. Choosing U'! = ./M;, we have 
S_, U'!(U"!)* = T, and therefore, it is possible to choose the remaining elements 
such that U = (U'/); ; is a unitary matrix, according to Exercise 1.2. This confirms 
the existence of a unitary matrix U that satisfies (7.5). Taking the trace in (7.5) gives 
us (7.4). | 


We next consider the possible state reductions according to the framework for a 
quantum measurement given in Sect. 1.2. Perform the measurement corresponding 
toa POVM M ona quantum system in a state p. Then, using the map k,,, we describe 
the resultant state with a measurement outcome w by 


1 
Tr pM,” (p). (7.6) 


The map &,, can be restricted to a completely positive map as shown below. In order 
to show this fact, we prove that 


Kw (Api + 1 — A)p2) = Aku (91) + (1 — A) Kw (p2) (7.7) 


for two arbitrary states p, and pz and an arbitrary real number 4 satisfying 0 < 
X < 1. Consider an application of a measurement corresponding to another arbitrary 
POVM {M!,,}...cq after the first measurement. This is equivalent to performing a 
measurement with the probability space 2 x Q’ on the system in the initial state. The 
joint probability distribution of w and w” is then given by Tr k.,(~) M/,, (Exercise 7.3). 
Consider the convex combination of the density matrix. Then, similarly to (1.11), 
the equation 


Tr kw (Api + I — A)p2) Ml), = ATr Kw (p1) M1, + (1 — A) Tr Ks (92) M1, 


should hold. Since {M/,,}...<q: is an arbitrary POVM, it is also possible to choose 
M/, to be an arbitrary one-dimensional projection. Therefore, we obtain (7.7). Taking 
entangled input states into account, we then require k., to be a completely positive 
map. This statement can be shown by using arguments similar to that of Sect. 5.1. 
That is, it can be verified by adding a reference system. Since (7.6) is a density matrix, 
we obtain [4] 


Trk.(p) =TrpM., Vp, (7.8) 


which is equivalent to M,, = «*(J). Thus Tr >°,, &(e) = 1. The measurement 
with the state reduction is represented by the set of completely positive maps 
K = {ky}veq, where the map >”, «,, preserves the trace [3]. Henceforth, we shall call 
«& = {Ky}veg aninstrument. In this framework, if pis the initial state, the probability 
of obtaining a measurement outcome w is given by Tr k,,(p). Once a measurement 
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outcome w is obtained, the resultant state is given by Kw(p). We can also 


1 

Tr k,,(p) 
regard the Choi—Kraus representation {A;} of aTP-CP map as an instrument {«;} with 
the correspondence «;() = A;pA*. The notation “an instrument {A;}” then actu- 
ally implies an instrument {«;} with the above correspondence. Therefore, the state 
evolution with a Choi—Kraus representation {A;} can be regarded as a state reduction 
given by the instrument {A;} when the measurement outcome is not recorded. (The 
measurement is performed, but the experimenter does not read the outcome.) 

When the instrument is in the form of the square roots {,/M_,} of aPOVM M = 
{M.}wea, the resultant state is given by (7.1) and will be denoted by Ky. If the 
instrument & = {k,,}eq and the POVM M = {M.,}.,<q satisfy condition (7.8), we 
shall call the instrument « an instrument corresponding to the POVM M. We can 
characterize an instrument corresponding to a POVM M as follows. 


Theorem 7.2 Let & = {Kk }ueq be an instrument corresponding toa POVM M = 
{M_,} in a quantum system ‘H. There exists a TP-CP map k/, for each measurement 
outcome w such that 


nlp) = #1, (VMapVM.). (7.9) 


According to this theorem, it is possible to represent any state reduction kK as a 
combination of the state reduction given by the joint of the typical state reduction 
(7.1) and the state evolution «/, that depends on the outcome w of the POVM M ? 


Proof From Condition © (the Choi—Kraus representation) of Theorem 5.14, there 


k 
exists a set of matrices F,,..., E, such that K,,(p) = > E;pE;. Since Trk,,(p) = 
i=l 
Tr p >, EXE; = Tr pM, for an arbitrary state p, then >"\_, E*E; = M,. Using 
the generalized inverse matrix JM, defined in Sect. 1.5, and letting P be the 
projection to the range of M., (or ./ M.,), we have 


k 


>. JM. ESE/M. = P. 


i=l 


Hence, the matrices E,/M,, ee a ee I — P are the Choi—Kraus repre- 
sentations of the TP-CP map. Denoting this TP-CP map as &/,, we have 


7It can also be understood as follows: the state reduction due to any measurement by PVM can 
be characterized as the state reduction satisfying the projection hypothesis, followed by the state 
evolution k,,. Indeed, many texts state that the state reduction due to any measurement is given by 
the projection hypothesis. Theorem 7.2 guarantees their correctness in a sense. 
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Ki (/M.pVM.) 
k 
=(1 — P)/M.p/MU — P) + > EVM. Mp M/M, Et 


i=1 
k 
i=1 


Therefore, we see that (7.9) holds. a 


Combining Theorems 7.1 and 7.2, we can construct a model of indirect measurement 
in a manner similar to Theorem 7.1 for an arbitrary instrument K = {k,,}. 


Theorem 7.3 Let H4 and ‘Hg be two quantum systems. The following two cond- 
itions are equivalent for the set of linear maps kK. = {Kky}yeQ from T (Ha) to T (Hp). 


@ Kis an instrument. 
@ «can be expressed as 


Kulp) = Tr (Ia,8 ® Ex) U (p ® po) U" (Ia.8®E.), (7.10) 


where Hc is a quantum system with the dimension dim H g x (Number of elements 
in $2), po is a pure state on Hp ® Hc, E = {E,,},, isa PVM on He, and U isa 
unitary matrix onH, ® Hg ® He. 


The above is also equivalent to Condition @ with arbitrary-dimensional space Hc, 
which is called Condition @’. 


If 74 = Hz, the following corollary holds. 


Corollary 7.1 (Ozawa [3]) The following two conditions for the set of linear maps 
K = {Kw}wen from T (Ha) to T (Ha) are equivalent [3]. 


@ Kis an instrument. 
@ «can be expressed as 


Kw(p) = Tr Ta @ Eu.) V (p ® po) V* 4 ® Eu), (7.11) 


where Hp is a (dimH,)* x (Number of elements in S2)-dimensional quantum 
system, Oo is a pure state on Hp, E = {E.,},, isa PVM on Hp, and V is a 
unitary matrix on H, ® Hp. 


Therefore, a model of indirect measurement exists for an arbitrary instrument 
« = {k,}. Henceforth, we shall call (Hp, V, po, EF) and (Hc, U, po, E) indirect 
measurements and denote them by Z. Let us now rewrite the above relation among 
the three different notations for measurements M = {M,,},«% = {k,}, and Z = 
(Hp, V, po, E). The POVM M only describes the probability distribution of the 
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measurement outcomes and contains the least amount of information among the 
three notations. The instrument « refers not only to the measurement outcome itself, 
but also to the resultant state of the measurement. Hence, a POVM M corresponds 
uniquely to an instrument «; however, the converse is not unique. Furthermore, 
the indirect measurement Z denotes the unitary evolution required to realize the 
measurement device as well as the resultant state and the probability distribution of 
the observed outcome. This is the most detailed of the three notations (1.e., it contains 
the most information). Hence, a POVM ™ and an instrument & = {k,,} correspond 
uniquely to an indirect measurement 7, although the converse is not unique. 

The proof of Theorem 7.3 is as follows: @’=@ and @=>@) follows from inspec- 
tion. See Exercise 7.1 for O>©. 


Exercises 


7.1 Show that Condition @ in Theorem 7.3 may be derived from the Naimark— 
Ozawa extension (Ho, p|, U’) and Condition ® by using the Stinespring represen- 
tation (Hc, po, U,,,) for the TP-CP map x/,, given in Theorem 7.2. 


7.2 Prove Corollary 7.1 using Theorem 7.3. 


7.3 Consider the situation when we apply a measurement M = {M.,} after appli- 


cation of an instrument Kh = {k,,} to a quantum system H. Define the POVM 


M' = {M/, ,} by Mi), = Ki,(M.,), where «* is the dual map of k,,. Show that 


Tr k.(p)M. = Tr pM/,_,, for an arbitrary input state p. 


7.4 Given an initial state given by a pure state x = (x*') € H4 @ Hz, perform a 
measurement given by the PVM {|u;)(u;|} (where u; = (u}) € Hp) on Hz. Show 
that the resultant state on H, with the measurement outcome i is given by 


ull? 


: def bps 
assuming that vj = ; uix®s) € Ha. 


7.5 Prove Theorem 5.4 following the steps below. 

(a) Using formula (7.3) for the state reduction due to a measurement, show that 
@=>©0. 

(b) Consider a quantum system 7{c. When a maximally entangled state is input to 
an entanglement-breaking channel «, show that (1) the output is a separable state and 
(ii) O+© using relationship (5.5). 


7.2 Uncertainty and Measurement 


7.2.1 Uncertainties for Observable and Measurement 


The concept of uncertainty is often discussed in quantum mechanics in various con- 
texts. In fact, there are no less than four distinct implications of the word uncertainty. 
Despite this, the differences between these implications are rarely discussed, and 
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consequently, “uncertainty” is often used in a somewhat confused manner. In partic- 
ular, there appears to be some confusion regarding its implication in the context of 
the Heisenberg uncertainty principle. We define the four meanings of uncertainty in 
the following and discuss the Heisenberg uncertainty principle and related topics in 
some detail [see (7.28), (7.31), and (7.33)]. 

First, let us define the uncertainty of an observable A, (X, ¢) for a Hermitian 
matrix X (this can be considered an observable) and the state p by 


A?(X, p) © Tr pX? — (Tr pX)? = Tr p(X — Tr pX)?. (7.12) 


Next, let us define the uncertainty of a measurement A,(M, p) fora POVM M = 
{(M;, x;)} with real-valued measurement outcomes and a state p by 


A3(M, p) = S201 — Ey(M))? Tr pM, Ey(M) = x Tr pM. (7.13) 


t t 


By defining the average matrix O(M) for the POVM M as below, the inequality 


A3(M, p) => Aj(O(M), p), O(M) = xi M; (7.14) 


U 


holds, and the equality holds when M isa PVM™’°. Inequality (7.14) can be shown as 


Tr >) (i — E,(M))’Mip = Tr(O(M) — E,(M))’p 


U 


because 5° ;(x; — E,(M))°M; — (O(M) — E,(M))? = 3°,(a; — O(M)) Mii — 
O(M)) > O. In particular, an indirect measurement J = (Hp, V, po, E) corre- 
sponding to M satisfies 


Ax(M, p) = Ar(E, V(p ® po) V") = A\(O(E), V(p ® po) V") (7.15) 


because FE is a PVM. Similarly, the Naimark extension (Hg, E, po) of the POVM 
M satisfies 


A2(M, p) = A2(E, p ® po) = Ai (OCE), p ® po). (7.16) 


Let us define the deviation A3(M, X, p) of the POVM M from the observable X for 
the state p by 


A2(M, X, p) = So T(x — X)Mi(a; — X)p. (7.17) 


It then follows that the deviation of M from O(M) becomes zero if M is a PVM. 
The square of the uncertainty A3(M , p) of the measurement M can be decomposed 
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into the sum of the square of the uncertainty of the average matrix O(M) and the 
square of the deviation of M from O(M) as follows™*”’: 


A3(M, p) = A3(M, O(M), p) + Aj(O(M), p). (7.18) 


When the POVM M and the observable X do not necessarily satisfy O(M) = X, the 
square of their deviation can be written as the sum of the square of the uncertainty 
of the observable X — O(M) and the square of the deviation of the POVM M from 
O(M) as follows™*”*: 


A3(M, X, p) = A}(M, O(M), p) + Aj(O(M) — X, p). (7.19) 


7.2.2 Disturbance 


Now, consider the disturbance caused by the state evolution « from quantum system 
Ha to quantum system 7g. For this purpose, we examine how well the POVM 
M = {(M;,, x;)}; on the final system 7/g recovers the observable X on H,. Its 
quality can be measured by the quantity A3(K*(M), X, p), where «*(M) denotes 
the POVM {(«*(M;), x;)}; on the initial system 71,4. Since «* is the dual map of 
the map «, the minimum value A4(kK, X, p) = miny A3(K*(M), X, p) is thought 
to present the disturbance with respect to the observable X caused by the state 
evolution «. Using the Stinespring representation, (7.18) yields™*’° 


A3(K*(M), X, p) = A}(M, O(M), K(p)) + A3(K*(Eocmy), X,p). (7.20) 


Thus, our minimization can be reduced to miny A3(K* (Ey), X, p). Interestingly, 
using the Stinespring representation (7c, po, U,,) of &, we can express the quantity 
As(s*(Ey), X, p) as" 


A3(K*(Ey), X, p) = Tr (U(X ® Ip,c)UZ — Uac ® Y))” Ux(p ® po) US. (7.21) 


As discussed in Sect. 6.1, the matrix «, ,(X) can be regarded as the image of U,,(X ® 
Iz.c)U; by the projection to the space {Y @ I}. Hence, by using property (6.16), the 
above can be calculated as 
* 2 * 
Tr (U(X ® Ip,c)UZ — a,c ® Kp,s(X)))° Ux(p ® po)U; 
2 * 
+ Tr (ac @ Ky,s(X)) — Tac ® Y)) U,.(p ® po) U;, 
2 2 
=Tr X*p—Tr (Gres) K(p) + (Y — Kp,s(X)) r(p)) (7.22) 


Thus, this quantity gives the minimum when Y = «,,(X). That is, the matrix 
Ky,s(X) on the output system gives the best approximation of the matrix X on the 
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input system. In particular, when & is the partial trace, k,,, can be regarded as the con- 
ditional expectation, i.e., the quantum version of conditional expectation. Therefore, 


the disturbance of X caused by « has the form 


Aq(K, X, p) = min A3(K*(M), X, p) = min As(n*(Ey), X, p) 


=Ax(K*(E xp =/ (1x2) - xy, .) 7.23 
=A3(K" (Ex, (x), XP) =f (XI lps IKp,sOMicp,s} (7.23) 


Hence, if X is the SLD e representation Lg, of the derivative, this can be regarded 
as the loss of the SLD Fisher metric. 


2 
Remember k,,,(X) is a kind of conditional expectation. So, (| Kip,s(X) ae is 
lower bounded by (Tr pX)?. That is, 


A4(K, X, p) < Ai(X, p). (7.24) 


Furthermore, when an instrument « = {k,,} is used, the disturbance of the observ- 
able X is defined as 


Aa(ts, X, p) = As, Xp), EZ > ww, (7.25) 


Letting Z = (Hc, U, po, E) be an indirect measurement corresponding to the instru- 
ment &, we can describe the disturbance of the observable X by 


Aj(K, X, p) =Tr((X ® Ip,c) — U*(La,c ® Rps(X))U)"(p ® po) (7.26) 
e)\2 = e 2 
= (IX)? = (pCO 5) > 
which may be found in a manner similar to (7.22). The four uncertainties given here 


are often confused and are often denoted by A?(X). Some care is therefore necessary 
to ensure these quantities. 


7.2.3 Uncertainty Relations 


The most famous uncertainty relation by Robertson [5] is 


X,Y 
Av(X, lu)(ul ACF, a) (uly = MA (7.27) 
This may be generalized to 
Ti X,Y 
Ae pages we. (7.28) 


2 
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Indeed, the above inequality still holds if the right-hand side (RHS) is replaced 
by Teele Fl However, if the state is not a pure state, inequality (7.28) is a stronger 
requirement. For the rest of this section, we assume that pis a density matrix, although 
the essential point is that p > O and not that its trace is equal to 1. 

Now, let us prove (7.28). The problem is reduced to the case of Tr pX = 0 by 
replacing X by X — Tr pX. Since 


0<(X Ai) Livy = (KX 4iV) XK iY) = 4+ Y? ix, YI, 


we have J p(x? + Y’)/p > ti,/p[X, Y]./p. From Exercise 1.34 we thus obtain 


A?(X, p) + Az(Y, p) = Tr |iPLX, Y1V pl = Tr |./pLX, YJ. (7.29) 


Replacing X by tX, we see that 
At(X, p)t? — Tr|./pLX, Y]./plt + A3(Y, p) = 0. 


Equation (7.28) can then be obtained from the discriminant equation for f. 

In addition, when the equality of (7.28) holds and Tr pX = Tr pY = 0, the relation 
(X + iY)./p = 0 or (X —iY)./p = 0 holds. 

The original uncertainty relation proposed by Heisenberg [6] was obtained through 
a gedanken experiment, and it relates the accuracy of measurements to the disturbance 
of measurements. The implications of the accuracy and the disturbance due to the 
measurement are not necessarily clear in (7.28). At least, it is incorrect to call (7.28) 
the Heisenberg uncertainty relation because it does not involve quantities related to 
measurement [7, 8]. 

One may think that Heisenberg’s argument would be formulated as 


Tr | VALX, VV 


A3(M, X, p)Aa(k, Y, p) = 5 


(7.30) 
fora POVM M and an instrument x satisfying (7.8). However, this is in fact incorrect, 
for the following reason [7, 9, 10]. Consider the POVM M that always gives the 
measurement outcome 0 without making any measurement. Then, A3(M, X, p) is 
finite while A4(«, Y, p) is 0. Therefore, this inequality does not hold in general. The 
primary reason for this is that the RHS has two quantities having no connection to 
the POVM M. Hence, we need to seek a more appropriate formulation. 

Indeed, Heisenberg’s gedanken experiment does not treat the above unnatural 
case. He considered the case when the measurement has the proper relation with 
the observable. That is, it seems that he considered the observable measured by the 
measurement M. In this case, it is more appropriate to address A2(M, p) rather 
than A3(M, X, p). Since the quantity A2(M, p) is defined only with a POVM M 
and a state p, it is better to lower bound the product A2(M, p)Aa(k, Y, p) by a 
amount determined by the POVM M, the observable Y, and the state p. Then, we 
can reformulate Heisenberg’s argument as follows. 
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Theorem 7.4 Let & be an instrument corresponding to the POVM M.A state p on 
Ha then satisfies* 


Tr |./pLO(M), Y1/ol 


A(M, p)Aa(k, Y, p) = 5 


(7.31) 


A proof of this relation will be given later. This inequality means that the product of 
the error A2(M, p) and the disturbance A4(k, Y, p) is lower bounded by a quantity 
involving M and Y. 

Remember A2(M, p) is lower bounded by A;(O(M), pe) as (7.18) and 
A4(k, Y, p) is upper bounded by A) (Y, p) as (7.24). When the POVM M correspond- 
ing to & is the spectral decomposition of O(M), we have A2(M, p) = A,;(O(M), p). 
In this case, (7.31) implies that 


Tr | /elLO(M), Y1 Jel 


Ai(Y, p) = Aa(k, Y, p) = 2A\(O(M), p) 


(7.32) 
When the equality in (7.28) hold for O(M) and Y, we have A; (Y, p) = Aa(K, Y, p). 


In particular, when Y is the SLD e representation Lg, of the derivative, due to 
(7.31), the information loss A4(K, Lo,s, p) satisfies 


Tr | /pLO(M), Las] /ol 


Ax(M, p)A4(K, Los, p) = 5 


Next, we consider a simultaneous measurement of two observables. For this pur- 
pose, we denote a POVM with two outcomes by M = ({M.}, {x}, {x2}). Then, the 
two average matrices are given by 


OM) = > xiM,, 07M) 2 >" x2M,. 


wW 


Theorem 7.5 The POVM M = ({M.}, {x}}, {y2}) satisfies* 


A3(M, O'(M), p)A3(M, O7(M), p) 


= |,/plO'(M), O7(M)]./p| 
— 2 - 


(7.33) 


3 As discussed in Sect. 6.2, Aq (K, Y, p) also has the meaning of the amount of the loss of the SLD 
Fisher information. Therefore, this inequality is interesting from the point of view of estimation 
theory. It indicates the naturalness of the SLD inner product. This is in contrast to the naturalness 
of the Bogoljubov inner product from a geometrical viewpoint. 


4The equality holds when an appropriate POVM M is performed in a quantum two-level system 
[11]. For its more general equality condition, see Exercise 7.17. 
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There have also been numerous discussions relating to uncertainties, including its 
relation to quantum computation [9, 10]. 


Proof of Theorem 7.4 The theorem is proven by considering an indirect measurement 
Z = (Hc, U, po, E) corresponding to the instrument k. Let Z = K,,;(Y). Then, from 
(7.26), 


A4(k, Y, p) = A (¥ ® Ip.c — U*(Z @ I4.c)U, p ® po). (7.34) 
Since the indirect measurement J = (Hc, U, po, E) corresponds to the POVM M, 


we have Trg.cU ® ./po)U*Ua.p ® O(E))UU ® /po) = O(M)™"”. Referring 
to Exercise 1.25, we have 


Tr VP@POLU* U,2@ O(E))U, ¥ @Ip,c—-U*(Z@1a,cU VPP 
= Tr Vp® polU* U4,8@ O(E))U, ¥ @I,.clVp@ po 
=,/plO(M), Y1Vp. 
Thus, 
Tr |/plO(M), Y1V/p| 
=Tr | A J p® polU* (14,8 ® O(E))U, Y @Ip.c—U*(Z@I4,c)U]/p® pol 


<Tr|J/p ® polU* (4,8 ® O(E))U, Y @ Ipc — U*(Z @ Iac)U]V/p ® pol 
<A\(U* (14,8 ® O(E))U, p® po) Ai(Y ® Ip.c — U*(Z @ Ia.c)U, p ® po). 


Finally, (7.15) implies the equation A2(M, p) = A\(U*(4,8 ® O(E))U, p ® po). 
Combining these relations with (7.34), we obtain (7.31). a 


Proof of Theorem 7.5 We apply Exercise 5.7. Let us choose Hg, a PVM on H@H_2, 
and a state p9 on 7g such that 


Tr pM, = Tr(p ® po) Ew, 


with respect to an arbitrary state p on H. Let (71g, E, po) be the Naimark extension 
of M. Then, 


A3(M, O'(M), p) = Ai(O'(E) — O'(M) ® Iz, p @ po). 
Since [O'(E), O7(E)] = 0, we have 


[O'(E) — O'(M) ® Ig, O?(E) — O7(M) ® Ig] 
=-—[0'(E), O7(M) ® Ig]—[O'(M) ® Ig, O° (E) — 07(M) ® Ig]. 
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A\(O'(E) — O'(M) ® Ig, p ® po) A\(O7(E) — O7?(M) @ Ig, p ® po) 
>Tr |/p @ polO!(E) — O'(M) @ Ip, O7(E) — 07(M) ® Ig] Vp ®@ pol 
> Tr| Tr /p @ polO'(E) — O'(M) ® Ip, O7(E) — 0?(M) ® InlV0® po 
=Tr| Tr Vp ® po(—[0'(E), O°(M) ® Is] 

—[O!(M) ® Ip, O*(E) — 07(M) ® Ig])/p ® pol 
Tr| Tr Vp ® po polO'(E), O?(M) ® Ig]y/p @ pol- 


This completes the proof. In the last equality, we used the fact that Trg(] ® 


./po)(O?(E) — 0?(M) @ Ig)(I @ /po) = 0 and Exercise 1.25. | 
Exercises 


7.6 Show that the equality in (7.14) holds for the PVM M. 
7.7 Show (7.18). 
7.8 Show (7.19). 
7.9 Show (7.20). 


7.10 Show (7.21) following the steps below. 
(a) Let (Hc, po, U) be a Stinespring representation of «. Show that any projection 
E satisfies 


Tr(X @ I —x)U*(1 ® E)U(X @1 —x)p ® po 
= Tr(X — x)K*(E)(X — x)p. 


(b) Show (7.21). 


7.11 Let (Hg, E, po) be a Naimark extension™*’ of M = ({M,}, {x.}). Show that 
A3(M, X, p) = Aj(O(E) — X ® Ig, p® po). 


7.12 Show that 

A2(M, p) < A3(M, X, p) + Ai(O(M), p) (7.35) 
by following steps below. 
(a) Show that /a2 + b? — c2 <a+dwhena,b,c,d>,a>c,andc+d>b. 
(b) Show (7.35) using the above. 


7.13 Show the following using (7.31) [7]. 
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T X,Y 
A2(M, p)Aa(K, Y, p) + Ai\(O(M) — X, p)Ai(Y, p) = — 
(7.36) 


7.14 Show the following using (7.36) [7]. 


A3(M, X, p)A4(k, Y, p) + A(X, p)Aa(K, Y, p) + A3(M, X, p)Ai(Y, p) 


cee (7.37) 


7.15 Define the correlation between two Hermitian matrices X and Y under the 
state p as 


Cov,(X, ¥) & Tr(X — Tr Xp) 0 (Y —TrYp)p, (7.38) 


which is a quantum analogue of the covariance Cov, (X, Y) defined in (2.93). Show 
that 


ie ee X) Cov, (X, >) . é LVALX, i), 


Cov,(X, Y) Cov,(Y, Y) 5) (7.39) 


which is a stronger inequality than (7.28), using (7.28). 


7.16 Show that the equality in (7.39) always holds if H = C? [11] by following the 
steps below. This fact shows the equality in (7.28) when Cov,(X, Y) = OforH = Cc’. 
In the following proof, we fist treat the case where Tr X = Tr Y = 0, which implies 
that X and Y are written as X = yy xiS;,¥ = ee y,S;,p= $( A a; S;). 
After this special case, we consider the general case. 

(a) Show that Cov,(X, Y) = (x, y) — (x, a)(a, y). 

(b) Let z be the vector product (outer product) of x and y,ie., z= xx y = 

a def 

(X23 — X32, X31 — X13, X12 — X2y1). Show that >[X, Y] = Z = y 3 2 5j. 


(c) Show that Tr |./pZ/p| = v/Ilz\|? — |lz x all?. 


(d) Show that (7.39) is equivalent to 


(IIx lI? — (x, a)*)(Ilyll? — (y, a)?) — (x, y) — (x, a) a, yy” 
>I|_x x yl? — I x y) x all? (7.40) 


when Tr X = Tr Y = 0 ina quantum two-level system. 

(e) Show (7.40) if (x, y) = (x, a) = 0. 

(f) Show that there exists a2 x 2 matrix (;,;) with determinant 1 such that the vectors 
ze bi 1x + bi 2y and y = bo1x + bo2y satisfy (x, y) = (x, a) = 0. 

(g) Show (7.40) for arbitrary vectors x, y, a. 

(h) Show that (7.39) still holds even if Tr X = Tr Y = 0 does not hold. 
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7.17 Show that the POVM My.y,, below satisfies O'(M) = X, O?(M) = Y. 
Further, show the equality in (7.33) when X, Y, p satisfy the equality in (7.28). 

Construction of the POVM My y,,: Let the spectral decomposition of X and Y be 
X = 0, x; Ex; and Y = Di y;Ey,;, respectively. Define the POVM Mx.y,, with 


the probability space 2 = {i} U{j} for p € (0, 1) as follows. Let My y,,; = pEx, 


def def 
and My,y,»,; = (1 — p)Ey,;. Define x} = 4(a; — TrpX) + TrpX, x; = Tr px, 
2S Tr pY,x7 S + (y;—Tr pY)+Tr pY. The POVMis then defined as M x,y.) = 


L 


{(Myx.y.p.i5X},%7)} U {(Myx.y.p,) a x7)}. 


x 


7.18 Using (7.31), show that the following two conditions are equivalent for two 
Hermitian matrices X and Y. 


® [X,Y] =0. 
@ There exist an instrument K = {k,,} and a set {x,,} such that the following two 
conditions 


Tr pX = > Trkw(p), Ag(K, Y, p) =0 


hold for an arbitrary state p. The first equation implies that the instrument K 
corresponds to the observable X. The second equation implies that the instrument 
« does not disturb the observable Y. 


7.19 Show that 


Tr ((I4 ® ./p0) U* (Ia,8 ® O(E)) U (14 ® \/p0)) = O(M) (7.41) 


for an indirect measurement Z = (Hc, U, po, E) corresponding to M. 


7.20 Given two state evolutions «, and «2, show the following items. 

(a) Show that AZ(AK1 + (1 — A)k2, X, p) = AAZ(K1, X, p) + (1 — ANAZ(Ka, X, p). 
(b) Show that the equality holds when the space spanned by the supports of K(X 0 p) 
and «1 (p) is orthogonal to the space spanned by the supports of K2(X o p) and K2(p). 


7.21 Let two Hermitian matrices X and Y on H satisfy the equality in (7.28). Let 
X= ee x; E;, and define the POVM M = {M,, x; a according to the conditions 
Xo = Tr Xp, % = Tr Xpt iGa — Tr Xp), My = (1 — p)I, and M; = pE;. Now 
consider another equivalent space H’ to 1, and the unitary map U from 1 to H’. 
Define & = {x;} according to Ko(p) = (1 — p)UpU* and k;(p) = pM p./M;j. 
That is, the output system of & is 7#1®H’. Show that O(M) = X and that the equality 
of (7.31) holds. 
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7.3 Entropic Uncertainty Relation 


Even though the initial state p is pure, the outcome of a measurement M is not 
deterministic. In this case, the uncertainty of the outcome can be evaluated by use 
of the entropy H er ). Of course, if the the element of the PVM M is commutative 
with the state |x)(x|, the outcome is deterministic, i.e., the entropy H (Pe ) is zero. 
However, when two non-commutative PVMs M and M’ are given, there is no pure 
state |x)(x| satisfying that H Px a) =H (2G) = 0. Similar to the uncertainty 
relation concerning the square errors, we have the following relation between two 
quantities H(PM...) and H(PM..). 


Theorem 7.6 (Entropic Uncertainty Relation, Maassen and Uffink [12]) Let M and 
M' be the PVMs composing of bases {u;} and {v;} of Ha. Any state p satisfies 


H(ku(p)) + H(ku(p)) = HM) + HP) > —loge + H(p), (7.42) 


where c := max, ; |(u;|v;) |?. The equality holds when |(u;|v;)| does not depend on 
Ll, j and the matrix A + B is commutative with p, where 


A:= DS log(ujlpluj)uj)(ujl, B= D2 log(v/|plur)|ur) (ul. (7.43) 
j 1 


Proof Golden-Thompson trace inequality (Lemma 5.4) yields that Tre4e® > 
Tre4*4, We also have 


Tre*e? = Diu jlolus)(vilplvr) Tr |vy) (vz |uj) (ue ;| 


Lj 
= Do (ujloluj)(uiloluy) (ulus)? < doujlplu;)(uilpluye=c. (7.44) 
Lj Lj 
Choosing the other state o := e4*+8/Tre4*®, we have Trplogp — Trpo = 


D(p\|a) = 0. Hence, combining the above relations, we obtain 


HOM) + Ae) = —Trp(A+ B) = —Trplogo — log Tre4*4 
> —Trplog p — log Tre“e® = H(p) — log Tre*e® > H(p) — loge, 


which implies (7.42). The equality of the first inequality holds when the matrix A+ B 
is commutative with p. The equality of the second inequality, i.e., that of (7.44) holds 
when |(u ;|v;)| does not depend on / and j. Hence, we obtain the required equality 
condition. a 


As a generalization of Theorem 7.6, we can show the following theorem by replacing 
the entropies by the conditional Rényi entropies. 
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Theorem 7.7 ({13-15]) Given a state p on the system Hs, ® Hg ® Hc, let {My}x 
and {Ny}, be two POVMs on H,4. We define the overlap c := max,,y || VM. /Ny |’ 


and consider the post-measurement states 


pxe = > lx)x x(x] @ THM, ® Iac)p. (7.45) 


x 


prc = 2 Iy)y v(y] ® Tey ® Tac). (7.46) 


Then, the following relations hold: 


1 
H},,(X|B) + H3,(¥ IC) = log =, for a, 8 €[0,2],a+8=2, (7.47) 
1 1 1 1 
Hj,,(X1B) + H},,(YIC) > log—, for a, 8 ie co], +5 =2 748) 
1 
H},,(XB) + H4,(/1C) > log =, for 0 € 10,21, 8¢ [5,00], a-G=1. 
c 
(7.49) 
In particular, when a = 3 = 1, we have [16, 17] 
1 
H,(X|B) + H,(Y|C) = log -. (7.50) 
Cc 


In Theorem 7.7, we introduce additional classical systems 7/, and Hy to give 
our arguments (7.47)—(7.50). When the POVMs M = {M,}, and N = {Ny}, are 
rank-one PVMs, these relations can be stated without additional classical systems 
Hy and Hy. Then, (7.50) can be written as 


1 
Hicy(p)(A|B) + Hey (AIC) 2 log —. (7.51) 


Now, we focus on mutually unbiased bases. Two bases {uj} and {v} on a 
d-dimensional system 71, are called mutually unbiased when | (u;|v;) |? = 1/d fori 
and j. Now, we apply (7.51) to the POVMs M and N given by the measurement based 
on the mutually unbiased bases {u ;} and {v}. Then, we have c = 1 /d. When the rela- 
tion H,,,,(p)(A|B) = 0 holds, i.e., the outcome of M of the system H, is completely 
determined by the optimal measurement of 1g, we obtain A,.,(,)(A|C) = logd, 
which implies that D(ky(p)||Pa.mix ® pz) = 0, Le., 


Kn (Pa,c) = PA.mix ® pc. (7.52) 


Proof Since any conditional Rényi entropy satisfies the relations given in 
Exercise 5.49, it is sufficient to show the case when p is a pure state. We define 


>This definition of c is the generalization of that in Theorem 7.6. See Exercise 7.22. 
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the isometries Vy and Vy from H, to either H, @ Hy @ Hx or Ha, ® Hy ® Hy, 
respectively, as 


Vxla) = >¢ |x) x @ [x)x ® V/Myla), (7.53) 
Vyla) = >- |y)y ® ly)y ® V/Nyla). (7.54) 


We apply the duality relation given in Theorem 5.13 to the pure state Vy pV; on the 
system Hy ® Hy ®H, @ He ® Hc. Then, we have 


Haip(V|C) = overt (Y|C) = Ay, pvi :(Y|Y'AB), 
Hi @icy= ce pricane AB), Agip(Y|C) = Le pi VlY AB) 
when a and @ satisfy the respective condition. Since 
Haip(X|B) = Hajvepy{(X1B), Hdi(X1B) = Aly, yy X1B), 
H)ip(X|B) = Ht, (X1B), 
it is sufficient to show that 
alae (X|B) + logc > H al Ve pvt :(Y|Y'AB), (7.55) 
ae yi (XIB) + loge > a - med AB), (7.56) 
Hi pv; (X1B) + loge = Ht wv (Y|Y’ AB). (7.57) 


Here, we will show (7.57). For this purpose, using the relation ,/N,M,,/Ny < cla, 
we evaluate Try4 Vx vy Uy ® oy'ap) Vy val as follows: 


Tr VxV; ‘(ly ® oy ag) Vy Vy 


=r Ve 2 vty ® y(y| ® /Ny (ly ® ovaa ly )y @ ly )y ® JN Vy. 


yy 
= Dox x(x] @ Tr VM, (x vy ® VNyovrasly)y 0 VR VM 
x y 
= Diledx x(a) @® Dy Tye (yl @ VNyMe/Ny ovis 
x y 
< Dix xl @ D7) Tey) v(yl @ cla)ovias] 
x y 


=lx ® pry @ cl,)oyap] =cly @ og. (7.58) 
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Since Vy is an isometry, Vy a is a projection to the image of Vy, which can be 
regarded as a subspace of Hy ® Hy @ Ha @ Hz. Since Vy pap Vy = (Vy pVp)yv-an: 
we have (Vy Vy)(VypVy)yvaB(Vy Vy) = (Vy Vy) VypaBVy (Vy Vy) = VypasVy- 
Hence, (b) of Exercise 5.25 yields the following relation (a): 

Da((Vy pVy)vvasllly ® ovras) 
(a) + cs + 
> Da(Vy pan Vy (Vr Vy) Uy ® ovan) (Vy Vy)) 
b 4 
2D, (pasllV; ly ® ovas) Vy) 
(c + 7 7 
2D, (Vx pas VyllVx Vi Uy ® ovan)VyVy) 
(d) 4 4 Z 
= Da (ir Vx PAB Vy I at Vx v; Uy ® oyas) Vy V3) 


(e) + 
= Di((VxpVy) xe llelx ® op) 


2_ loge + Do( Tt Vx pap Vylllx ® op), 


where (b), (c), (d), (e), and (f), follow from (d), (d), (a), (e), and (c) of Exercise 
5.25, respectively. In particular, to derive (e), we employ (7.58) as well as (e) of 
Exercise 5.25. Thus, 


t t 
cs Avy, pvj IY AB) = mun Da((Vy pVy )vy'aBllly ® oyaB) 
>—loge + min D,(Tr Vx pasV,||Lx ® a8) 

Oy AB X'A 


—_ yt = 
= FA vyov; X18) loge, 


which implies (7.57). (7.55) can be shown by replacing oy 4g by py’,g. (7.56) can 
be shown by replacing D, by D,. | 


Exercise 


7.22. Show that || /M,./Ny|| = |(uxlvy)| when M, = |uy)(ux| and Ny = |vy) (vy. 


7.4 Measurements with Negligible State Reduction 


As discussed previously, any measurement inevitably changes the state of the mea- 
sured system. In this section, we propose a method constructing a measurement with 
negligible state reduction. When a measurement described by an instrument & is 
applied on a system in a state p, the amount of the state reduction is characterized by 


def 1 
e(p, K) = pa Tr &.,(p)b” (>. mw) ; (7.59) 
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where b is the Bures distance. In the following discussion, we consider the typical 
state reduction Ky of aPOVM M = {M;}. Then, the amount of the state reduction 
can be found to be [18] 


pa® Mipv Mi pil 


E(p, Km) = >) Tr pM; 1-—Tr Tr pM, 


=1 =o irom Tr yp! Mip/ Mp" 
=1 = VT Tp! Mp!” 
=1—>°/TrpM; Trp (Mi <1- > (TepVM) (7.60) 


Exe. 3.3 


where we used the quantum version of Jensen’s inequality in the last formula. 
Conversely, since M; < I, we have M; < ./M,, and therefore E(p,Kkm) > 1- 
»; (Tr p»/Mi) */ When there is no possibility of ambiguity, we will abbreviate 
E(p, Km) to e(p, M). In particular, if p; is generated with a probability distribution 
p = {pj}, the average of e(p;, Ky) can be evaluated as 


e(p, km) = = Derieten tin) 2 <1 Ld (tr p,/m) 
<l -S(Er Trp} Vm) “1-5 (nan (7.61) 


where pp = aay, pjpj- Hence, the analysis of e(p, Ky) is reduced to that of 1 — 


- 2 
Di (Tr ppv Mi) : 
Let us now consider which POVM M has a negligible state reduction. For this 


d . def abs ax def 
analysis, we focus on the number i, = argmax; Tr M; and the probability re pmax 


Tr M,,, p. Then, we obtain 


Im 


2 
dl _ eas (0 4 Pee) =1= oo > i= (Trp /M,,) 
2 
>1- >) (Tr p/Mi) > e(p, M). 


Therefore, we see that ¢(, M) approaches 0 when Pe “max approaches 1. 

However, a meaningful POVM does not necessarily have the above property, but 
usually has the following property in the asymptotic case, i.e., in the case of n-fold 
tensor product state p®” on H®”. Let (M™, x”) = {(M™, x”)} be a sequence of 
pairs of POVMs and functions to IR“. Hence, the vector x"(i) = (x”*(i)) is the 
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Just one measurement Randomly perform three 
measurements 


Fig. 7.1 Measurement with negligible state reduction 


measurement outcome subject to the probability distribution Py, . Suppose that the 
measurement outcome x”(i) = (x”*(i)) satisfies the weak law of large numbers as 
a random variable. That is, for a given density p, there exists a vector a € R“ such 
that 


Tr p*"M{\|x"(i) — all > '} > 0, Vel > 0. ae) 


For the definition of the notation M“” {||x” (i) — a|| > €'}, see (6.87). Therefore, we 
propose a method to perform a measurement with negligible state reduction from a 
POVM satisfying (7.62) as follows. 


Theorem 7.8 For a given positive real number 6 and a given positive integer l, we 
define the modified POVM M9! taking values in Z4 in the following way. We also 
define the function x” from the set Z4 to R4 as x2(j) = oj. If a sequence {bn} of 
real numbers and another sequence {I,,} of integers satisfy J, — 0, l, > oo, and 


Tr p®"M {\|x" (i) — al] = dn} > 0, ve) 
1,5n — 0, (7.64) 

we then have 
e(p®", Mdnlny > 0, (7.65) 
Trp?" Mr(Ix5 GF) —all = €} > 0, Vel > 0. (7-98) 


Construction of M)*!' Define 


def 
Uy. = {x e R¢ 


1 . 1 
Vk, y¥ — 5€ = x! < yt sef. 


~ def - 
Uy. = {x eR? 


1 e Coll 
Vk, y¥ — 56 <xt sy +5¢| 
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def (n) ) ‘ 
for y = (y*) € R*. Define My} = >> M,””. Then, (My iy sgh set is a 
x7eUuy 5 


POVM since >; a = I for arbitrary 6 > 0 and j' € {0,...,1— 1}4. 


O(G+H') 15 
JEdZ)4 


(n),6,1 def 1 (n) (n),6,1 (n),6,1 ‘: 
ja Moje So, M = {M; }jeze is a POVM 
age measurement outcomes in Z4 because it is the randomized mixture of the POVMs 


(Myc 4, ishiedzt a8 Fig. 7.1. 


Moreover, we define M; 


The existence of a sequence {6,,} that satisfies condition (7.63) and 6, — 0 can be 
verified from Lemma A.3 in Appendix. Note that the choice of the POVM M @),5 
depends only on the choice of 6 and not p®”. If the convergence of (7.62) is uniform 
for every « > O, then the convergences of (7.65) and (7.66) also does not depend 
on p. 


Proof of (7.65) Let oj € Ua,a—1)5 N 6Z*. Since {x?| |x") — all < 6} C Us;.15, we 
obtain 


1 5 n),O 
aM ile" (i) — al] < d} <M" (7.67) 


From the matrix monotonicity of x > ./x and the fact thatO < M (yy |x"(i)—al| < 
6} < I, we obtain 


aM mt" (i) -all < d}< NM lken(®) ~ al < dxf MM, 


Meanwhile, since #(Ug,a—15 N 6Z4) = (1 — 1), we have 


2 
Ep’, M5) < {= > (n pe" up*) 


jeZe 
2 

2i= = (n pe” -) 

df Ua as 

on 2 

67 Ua —1)5 

gai? nM ter (5 ; 
= 1 = Sy (Trp MM {lx"@ — all < 4)’. 


From (7.63) and the fact that J, — oo, substituting 6, and /, in 6 and /, respectively, 
we obtain e(p®”, Mon: ny + 0); i 


Proof of (7.66) If ||dj —a|| => € and x"(i) € Us; 15, then ||bx”" (i) —a|| > ée — Jdlo. 
It then follows that 
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M1 x27) — al] = €} < M™{\|x"@) — all = & — Valo}. 


Therefore, if 6, and J, are chosen such that condition (7.64) is satisfied, then (7.66) 
is satisfied. | 


Note that similarly we can show that €(p”, Kygoo.in.n) —> 0, which will be used in 
Chap. 10. 

As discussed in Chap. 6, the asymptotic performance of an estimator can be 
treated with at least two criteria. One is large deviation, wherein we focus on the 
decreasing exponential rate of the probability that the estimate does not belong to 
the neighborhood of the true value with a fixed radius. The other is small deviation, 
wherein we focus on the asymptotic behavior of mean square error. In mathematical 
statistics, it is known that the latter discussion is essentially equivalent to that of 
the probability that the estimate does not belong to the the neighborhood of the true 
value with a radius proportional to +e. That is, the difference between two criteria 
is essentially expressed by the difference of the asymptotic behavior of the radius of 
the neighborhood of interest. 

As mentioned in Exercise 7.23, if the original POVM is optimal in the sense of a 
large deviation, the deformed one is also optimal in the same sense. However, even 
if the original estimator is optimal in the sense of a small deviation, the estimator 
deformed by the presented method is not necessarily optimal in the same sense. 
That is, when lim,-,. e(p®", M” nn’) igs different from the original quantity, the 
modification affects the accuracy of the deformed estimator in the sense of the small 
deviation, but not in the sense of the large deviation. Therefore, it is expected that there 
exists a tradeoff relation between the limit of the mean square error of the estimator. 
and the difference between lim,_,.. ¢(p®”, M) and the original quantity. 

Moreover, since the measurement with negligible state reduction has not been 
realized in the experiment, its realization is strongly desired. 


Exercise 


7.23 Consider the sequence M = {(M”, 6")} of estimators for the state family with 
one parameter {/9|9 € R}. Let 8(M, 9, ©) be continuous with respect to «. Show that 
B(M, 0, ©) = B(((M™ "x" )}, 8, ©) when Id, > 0, where M“)* is defined in 
the above discussion. : 


7.5 Historical Note 


The mathematical description of a measurement process was initiated by 
von Neumann [1]. In his formulation, the measurement is described by a projection- 
valued measure. From the mathematical viewpoint, Naimark [2] showed that any 
POVM can be characterized as the restriction of the projection-valued measure. This 
projection-valued measure is called a Naimark extension. Holevo applied this argu- 
ment to quantum measurements [19]. 
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Further, Davis and Lewis [20] formulated the state reduction as a positive-map- 
valued measure. Following this research, Ozawa [3] proved that any measurement 
reduction should be described by a CP-map-valued measure, i.e., an instrument. He 
also proposed the indirect measurement model as a description of the interaction with 
the macroscopic system [21, 22]. Indeed, a positive-map-valued measure {k,,} is a 
CP-map-valued measure if and only if it can be described by an indirect measurement 
model [3]. For any POVM, its indirect measurement model gives a Naimark exten- 
sion of this POVM (Theorem 7.1). For example, an indirect measurement model of 
the joint measurement of the position Q and the momentum P is known (see Holevo 
[19]). Further, Hayashi et al. [23] constructed indirect measurements for a mean- 
ingful POVM for squeezed states. For a more precise description of state reduction, 
see Holevo [24], who discusses state reductions due to continuous-variable mea- 
surements using semigroup theory. Busch et al. [25] discuss the connection between 
this formulation and experiments. In addition, Ozawa characterized the instrument 
given by (7.1) as a minimal-disturbance measurement [22, 26]. Furthermore, this 
book treats state reductions where the input and output systems are different systems 
because such a reduction is common in quantum information processing. Hence, 
this book focuses on Theorem 7.3 as a generalization of Corollary 7.1 obtained 
by Ozawa [3]. 

The uncertainty relation between conjugate observables was discussed in the 
context of gedanken experiments by Heisenberg [6]. It was first treated mathe- 
matically by Robertson [5], who was not, however, concerned with the effect of 
measurement. Recently, Ozawa [7—10, 22] formulated the disturbance by measure- 
ment, and treated the uncertainty relation concerning measurement, mathematically. 
These are undoubtably the first attempts at a mathematically rigorous treatment of 
Heisenberg uncertainty. In this book, we mathematically formulate the same problem, 
but in a different way. In particular, the definition of disturbance in this text is differ- 
ent from that by Ozawa. Hence, the inequality given in this text is a sightly stronger 
requirement than that of Ozawa. However, the methods of Ozawa’s and our proofs 
are almost identical. For further discussion of the historical perspective of this topic, 
see Ozawa [9]. Indeed, Ozawa considered inequality (7.30) to be the mathematical 
formulation of Heisenberg’s uncertainty relation, and he gave its counterexample. 
He also proposed another type of uncertainty relation—(7.36) and (7.37)—due to 
measurement. However, in this book, inequality (7.31) is treated as the mathematical 
formulation of Heisenberg’s uncertainty relation. Therefore, the discussion in this 
book is different from that of Ozawa. 

Concerning the mixed-state case, Nagaoka [11] generalized inequality (7.27) to 
inequality (7.28). [Indeed, the RHS of Nagaoka’s original inequality has a different 
expression; however, it is equal to the RHS of (7.28).] This is a stronger inequality 
than the trivial generalization A,(X, p)A,(Y, p) = ree All inequalities in 
Sect. 7.2 are based on the former, 

Further, using inequality (7.28), Nagaoka [11] derived inequality (7.33) in the 
mixed-state case. The same inequality with the RHS aes has been discussed by 
many researchers [27—30]. Nagaoka applied this inequality to the Cramér—Rao-type 
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bound and obtained the bound (6.109) in the two-parameter case, first. Hayashi [31] 
extended this inequality to the case with more than two parameters. 

The extension of the uncertainty relation to the entropic uncertainty relation was 
firstly done by Maassen and Uffink [12] as Theorem 7.6. The proof presented in this 
book is based on Golden-Thompson trace inequality (Lemma 5.4) and was given 
by Frank and Lieb [32]. This inequality was extended to the case with conditional 
entropies as Theorem 7.7. Renes and Boileau [16] showed Inequality (7.50) in the 
mutually unbiased case, and conjectured it in the general form. Then, Berta et al. 
[17] showed it in the general form by representing the theorem in a different form. 
Then, Coles et al. [13] showed (7.47), and Miiller-Lennert et al. [14] did (7.48). Then, 
following the framework of [13], Tomamichel et al. [15] showed (7.49). 

The study of measurements with a negligible state reduction has been moti- 
vated by quantum universal variable-length source coding because quantum universal 
variable-length source coding requires determination of the compression rate with 
small state reduction (Sect. 10.5). Hayashi and Matsumoto [18] treated this problem 
and obtained the main idea of Sect. 7.4. This method is useful for estimating the 
state of the system without considerable state reduction. This method is often called 
gentle tomography. Bennett et al. [33] considered the complexity of this kind of 
measurement. 


7.6 Solutions of Exercises 


Exercise 7.1 Choose the unitary matrix (ee |Uy) (Uy | ® Un:,) (I,c ® U') and the 
state p', ® po as the unitary U and the state po in Condition ©. Then, the unitary U 
and the state po satisfy (7.10). 


Exercise 7.2 Let Hp be Hc ® Hz, consider the unitary matrix corresponding to the 


replacement W:u@vutr>v@uin Hy, ® Hg, and define V = (W @ Ic)U. Then, 
V satisfies (7.11). 


Exercise 7.3 Equation (5.2) yields that Tr pM/, ,, = Tr pk3(M.y) = Tr K3(p) Muy. 


Exercise 7.4 Since 
Te La @ [ui) (will) x] = vi) (vil, 
the resultant state is pple) (v;|. 


Exercise 7.5 


(a) Assume that « is given in @. When the input state is an entangled state |W) := 
> alu*, uP), we have 


K@tc(V)(H) = D1 Tr MY) (Y| @ Wa, 
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which is a separable form between Hg and Hc. 

(b) Let 3°; pip? @ p? = (K @ t4’)(\xm) (xu). Then, due to (5.5), the Choi- 
Jamiotkowski matrix has the separable form, i.e., K(k) = d >°, pip* ® p?. From the 
definition of K(k), we have Tr K(p)o = Tr K(k) p@o =d >, pi(Tr pp#)(Trop?). 
Thus, we obtain «(p) = d >°; pi(Tr pp?) p?. 


Exercise 7.6 


A3(M, p) = D101 — E,(M))? Tr pM; = Tr D1 (a; — Tr pO(M))" Mp 


1 L 


-1( Xo - moana) >5@; — Tr pO(M))M; | p 
i Jj 


t 


=Tr(O(M) — Tr pO(M))’p = A(O(M), p). 
Exercise 7.7 


A(M, O(M), p) = >) Tr(a; — O(M))M; (x; — O(M))p 


= b> Tr x? M;p — Tr O(M)’p 


L 


=>) Trx;)’Mip — (Tr pO(M))” — Tr O(M)’p + (Tr pO(M))” 
=A3(M, O(M), p) + Aj(O(M) — X, p). 
Exercise 7.8 
A3(M, X, p) = a: — X)M;i(x; — X)p 


= SS Trx;Mip — Tr XpO(M) — Tr O(M)pX + Tr X’p 
= 0 Trx?M;p — Tr O(M)’p + Tr O(M)’p 


— Tr XpO(M) — Tr O(M)pX + Tr X’p 
= >i Tri — OM) Mj (ai — O(M))p + THX — O(M))’p 


=A?(M, O(M), p) + A?(O(M) — X, p). 


Exercise 7.9 We denote the spectral decomposition of O(M) by {(Eowm,;, yj) };- 
Then, we have Tr O(M)*«(p) = Du y; Tr Eo, jk(p) = oui y; Tr K*(Eocm),j)p 
and >); xik* (Mi) = 6* (>), xi Mi) = K*(O(M)) = &*(D) Yj Eom, j) = DL) Vik™ 
(Eom, ). Using these relations, we have 
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A3(«*(M), X, p) = 2 Tr(a; — X)K*(Mi) (i — X)p 


= So Tex? a*(Mi)p - »(X sean) pX —Tr Xo( Saeco) +Tr X’p 


L L t 


= 0 Trx7M;K(p) — Tr O(M)?K(p) + >) y7 Te &*(Eowm,)P 
i j 


—Tr > vik" (Eon, i) pX —TrXp > yik* (Lou, i) + Tr Xp 
J J 


= Sri — O(M))Mj(x; — O(M))K(p) 


+ >°Tr(yj — X)K* (Zou); — Xp 
i 
=A3(M, O(M), K(p)) + A3(K*(Eoun), Xp): 


Exercise 7.10 


(a) Since Tr, p Trc[U* (I @ E)UT ® po] = Tr 1 @ EUp® poU* = Tre E Tra Up® 
poU* = Trc Ex(p) = Trc w*(E)p for any state p on Hy, we have Trc U*(I ® 
E)UI ® po = «*(E). Thus, 


Tr(X @ 1 —x)U*(1 @ E)U(X @1—x)p® po 
=Tr X @IU*(I ® E)UX @ 1p ®@ po — Tr(X @ I)xU* (I ® E)NUP® po 
— TrxU*(I ® E)U(X @1)p ® po + Trx’U* (I ® E)UU* (I @ E)Up® po 
=Tr X @1U*(I ® ENUX @1p@ po — Tr(X @IxU* (1 ® E)NUP® po 
—TrxU*(1 ® E)U(X @1)p ® po + Trx°U*(I @ E)Up® po 
=Tr XpX TrlU*( @ E)UI ® po] — Tr pXx Tr[U*(L @ E)UI ® pol 


alt Tr[U*(C @ E)UI ® polxXp+ Trx*p Tr[U*( @ E)UI ® po] 
=Tr XK*(E)Xp— Tr pXxK*(E) — Tr K*(E)xXp+ Tr x°px*(E) 


= Tr(X — x)K*(E)(X — x)p. 


(b) We denote the spectral decomposition of Y by {(Ey,;, y;)};. Then, 


A3(K*(Ey), X, p) 
=r by — X)K*(Ey,;)(yj — X)p 
j 
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=Tr > \(X @1 — yj)U*( ® Ey,; U(X ® I — yi) p ® po 
J 


=Tr > “[(X ® IU* (1 @ Ey, U(X @ I) — X®1U* (I @ yj Ey,j)U 
J 
— U*(I @ yjEy,j)UX®I + U*(I @ yi Ey,j)U]p ® po 
=Tr(X* @1 — X®1U*(1 @ YU — U*(1 @ Y)UX® 1 + U* (1 @ Y*)U)p ® po 
=Tr (U(X? @ 1)U* — U(X®1)U*(1 @ Y) — 1 @ Y)U(X®1)U* 
+ (I ® Y’))Up® poU* 
=Tr (U(X ® Ip,c)U* — (Ia,c ® Y))” U(p ® po) U*. 


Exercise 7.11 
A3(M, X, p) = >) Tra; — X)Mi(xi — X)p = D1 Tr My (aj — X) pi — X) 
=> ECG; — X @1)p® poly — X @1)) 
=> Tr@; —X @INE (x; — X @1I)p® po 


aT iG, 2, — xX, EV(X @1)—-(X @INxE; + (X @IE(X @!1)|p ®@ po 


L 


=Tr[(O(E)? — O(E)(X @ I) — (X @ NO(E) + (X2@ 1) ® po 
=Tr(O(E) — (X @1))°p® po 
=Aj}(O(E) — X ® Ig, p® po). 


Exercise 7.12 
(a) Since c+ d > b, we have b? < d*+c*+2cd < d* +c? + 2ad. Thus, 


a? + b* — c? <a? 4d? + 2ad, which implies /a? + b? — c2 <a4d. 
(b) Using (7.19) and (7.18), we have 


A3(M, p) = A3(M, O(M), p) + A{(O(M), p) 
=A3(M, X, p) — Aj(O(M) — X, p) + AT(O(M), p), 
and A3(M, X, p) = Ai(O(M) — X, p). Since A;(X, p) can be regarded as a norm 
for X, we have A\(O(M) — X, p) + A(X, p) = Ai(O(M), p). 


Applying (a) to the case when a = A3(M,X,p), b = A,\(O(M),p) c = 
A,(O(M) — X, p), andd = A,(X, p), we have 
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A(M, p) = y/A3(M, X, p) — A2(O(M) — X, p) + A7(0(M), p) 
<A3(M, X, p) + A(X, p). 


Exercise 7.13 Since TIWAO@O@*% "Wel < A,(O(M) — X, p)Ai(Y, p), (7.31) 
implies that 


T O(M),Y 
A2(M, p) Aq(K, Y, p) = rl vel ~ l/el 


_TrlVAlX, Y1Vp + VALOM) — X, Y1vAl 
2 


THVALX. Y1VAl _ TrhyplOM) — X, Y1Vp 
. 2 2 


>i Y1Jol A,(O(M) — X, p)Ai(Y, p). 


Exercise 7.14 Notice that (7.19) implies A3(M, X, p) > A,;(O(M) — X, p). Using 
(7.35) and this inequality, we have 


(A3(M, X, p) + A, (O(M), p)) Aa(K, Y, p) + A3(M, X, p)Ai(Y, p) 
>A2(M, p)Aa(k, Y, p) + A3(M, X, p)Ai(Y, p) 
> A2(M, p) Aa(k, Y, p) + A;(O(M) — X, p) Ai (Y, p) 


_ Te VALX, YIVAI 
_ ee. 


Exercise 7.15 Choose a 2 x 2 orthogonal matrix (a;,;) such that the two matrices 
X Say) X +.ay2¥ and ¥ © ay, X + ay2¥ satisfy Cov,(X, ¥) = 0. Then, (7.39) 
for X, Y is equivalent with (7.28) for X, Y. Since the fact that both sides of (7.39) 


are invariant under the orthogonal matrix transformation (X, Y) te (X ; Y ), (7.39) 
holds for X, Y. 


Exercise 7.16 Use the fact Si ° S; = 6; Si and Tr S; Sj = 26), j- 


(e) Since the relation ||x ||? (y, a)? = ||(x x y) x a||? holds in this special case, (7.40) 
holds. 

(g) Both sides of (7.40) are invariant under the transformation (x, y) > (x, y) 
because the determinant of (b;,;) is 1. Then, the statement (e) implies (7.40) for 
arbitrary vectors x, y, a. 

(h) Both sides of (7.39) are invariant under the transformation (X,Y) h (X — 
xI,Y —yl). 


Exercise 7.17 We have 
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1 
o'\(M) = >° Ge — Tr pX)+ TpX) pEx; + > (Tr pX)(1 — p)Ey,j 
i j 
= X —(TrpX)/ + p(Trp)7 + — p)(Trp)l = x. 


Similarly, we have O?(M) = Y. We also have 
A3(My.y.p, X, p) 
1 2 
== (So — Tr pX) +Tpx) Tr pExip 
_ \P 


+ >) (Tr pX)? Tr(l — p) Ey, jp — (Tr pX)” — Tr p(X — (Tr pX))” 
j 


1 2 
= > (Ss — Tr pX)? + (Tr pX)? + ma — Tr pX)(Tr px)) Tr pEy.ip 
+ (1 — p)(Tr pX)* — (Tr pX)? — Tr p(X — (Tr pX))° 
1 
= (Co — Tr pX)? + p(Tr pX)* + 2(x; — Tr pX)(Tr px)) Tr Ey ip 
ee 


+ (1 — p)(Tr pX)* — (Tr pX)° — Tr p(X — (Tr pX))? 

_\ p(X — Tr pX)? + p(Tr pX)* + 2Tr p(X — Tr pX)(Tr pX) 
P 
+ (1 — p)(Tr pX)* — (Tr pX)* — Tr p(X — (Tr pX))? 


1 ie 
=—Tr p(X — Tr pX)? — Tr p(X — (Tr pX))? = —"Ai(X, p). 
P P 
Similarly, we have 


Pp 
A3(Mx,y,9,Y, p) = i-po p). 


Hence, when X, Y, p satisfy the equality in (7.28), the equality in (7.33) holds. 


Exercise 7.18 When © does not hold, @ does not hold due to (7.31). Hence, @ 
implies ©. 

Assume @. Choose the spectral decomposition {(Ey.,, X)}, of X. Then, choos- 
ing Ky aS Ky (p) := Ex pEx,,., we obtain the conditions for @. 


Exercise 7.19 Since 
Tr pM; = Tr pis ((4 ® ./o) U* (Ia, ® Ei) U (I, ® \/po)) 


we have 
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O(M) = >°xiM 
= Do Tr ((l1 @ VP) U* (11,8 @ Ei) U (14 @ Po) 
= Tr ((J4 @ VP) U" (Ja.8 @ OCB)) U (In @ VF). 


Exercise 7.20 


(a) Consider the exponential family pg generated by state p with the SLD X, where 
fo = p. Then, the SLD Fisher informations of the families \«) (pg) + (1 — A)K2(p9)» 


K1(p9) and K2(p9) at 6 = 0 are (I|AK\(X) + U — A)r2(X) keener =\)x2(p), 7: , 


2. 
(ioe ) , and ((xeolhe y _ Thus, Exercise 6.26 implies that 


Ki (p),8 K2(p),s 
2 
(1) + d= COIS (yn neatons) (7.68) 
2 2. 
< (Oy) + (ICOM 5) - (7.69) 


Hence, (7.25) implies that 
Ai(AK, + (1 = A)Ka, X, p) 
e)\2 e ce 
= (IX) - (IX) +(1- DRCNet ites) 


2 
> (IX)? =A (I COM, bi = =) (Ie 201.5) 
=A} (K1, X, p) + (1 — A)AR(K2, X, p). 


(b) Due to Exercise 6.26, the assumption of (b) satisfies the equality condition for 
(7.69). So, the desired equality holds. 


Exercise 7.21 We can show O(M) = X by the same way as O'(M) = X in 
Exercise 7.17. We can also show that A2(M, p) = A1(X, p)/p by the same way as 
O'(M) = X in Exercise 7.17. 

Define two TP-CP maps kg := Ko/(1 — p) and Kp := yo k;/p. Since Kg is an 
operation making nothing, we have 


A4(Ka, Y, p) = 0. 
Then, due to (7.32), the equality in (7.28) implies that 
Aa(Kp, Y, p) = Ai(Y, p). 


Since 1 and 1’ are orthogonal to each other, Exercise 7.20 guarantees that 
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Ag(k, Y, p) = (1 — p)Ag(Ka, Y, p) + pAg(Ko, Y, p) = pAi(Y, p). 
Therefore, we have 
Ax(M, p)Aa4(K, Y, p) = A(X, p)Ai(Y, p). 


Hence, the equality in (7.28) guarantees the equality in (7.31). 


Exercise 7.22 The relation (A.17) implies that ||, /M/Ny || = || \/ /NyMx/Ny || = 
| MMe lvy)Pley) (vylf] = I eexlvy)llvy) (yl = Marlvy)L- 


Exercise 7.23 From the definition of M@->! , we have 


M"{\0 — 6"| > e+ } CM Pr 119 — x8 | > €} 


n On ln 1 
CM"{|0 — 6"| > ey 


On ln + 1) 
2 


: Onn +1) 
Since aac ie 


— 0, we obtain 


60 


lim 3(M, 0, € + 5) > BM", x2.)}, 0, ©) = lim B(M, 6, € — 9). 


As 3(M, @, €) is continuous with respect to «, we obtain the desired argument. 
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Chapter 8 
Entanglement and Locality Restrictions 


Abstract Quantum mechanics violates daily intuition not only because the mea- 
sured outcome can only be predicted probabilistically but also because of a quantum- 
specific correlation called entanglement. It is believed that this type of correlation 
does not exist in macroscopic objects. Entanglement can be used to produce nonlocal 
phenomena. States possessing such correlations are called entangled states (or states 
that possess entanglement). A state on a bipartite system is called called a maxi- 
mally entangled state or an EPR state when it has the highest degree of entanglement 
among these states. Historically, the idea of a nonlocal effect due to entanglement was 
pointed out by Einstein, Podolsky, and Rosen; hence, the name EPR state. In order to 
transport a quantum state over a long distance, we have to retain its coherence during 
its transmission. However, itis often very difficult because the transmitted system can 
be easily correlated with the environment system. If the sender and receiver share an 
entangled state, the sender can transport his/her quantum state to the receiver without 
transmitting it, as explained in Chap. 9. This protocol is called quantum teleporta- 
tion and clearly explains the effect of entanglement in quantum systems. Many other 
effects of entanglement have also been examined, some of which are given in Chap. 9. 
However, it is difficult to take advantage of entanglement if the shared state is insuf- 
ficiently entangled. Therefore, we investigate how much of a maximally entangled 
state can be extracted from a state with a partially entangled state. Of course, if we 
allow quantum operations between two systems, we can always produce maximally 
entangled states. Therefore, we examine cases where locality conditions are imposed 
to our possible operations. 


8.1 Entanglement and Local Quantum Operations 


As explained in Chap. 9, when there are two players, they can perform several 
magical protocols using a maximally entangled state defined in Sect. 1.4. However, 
all entangled states are not necessarily a maximally entangled state. To perform 
such magical protocols based on a partially entangled state, we need to convert the 
partially entangled state. If we are allowed to any quantum operation, we can generate 
a maximally entangled state. Hence, it is usual to impose a locality condition for our 
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operations. The most usual condition is the condition for local quantum operations 
and classical communications (LOCC). That is, we are allowed to perform local 
quantum operation on each player and classical communication between two players. 
Now, we consider what operations are possible on an entangled state under this 
condition. 

As stated in Sect. 1.2, a pure state on 7/4 can be represented by an element u on 
Ha with the norm |||| equal to 1. A pure entangled state in the composite system is 
represented by an element x on 71,4 ® 71g. Using the basis u;,..., ug for Ha and 
the basis v,,..., vg for 7g, x can be written as x = ee xh ‘yu; ) ® |v;). Let us 
define the matrix form of x as the linear map X,. from Hg to H,4 with respect to x 
by |x) = |X). Then, 


Kea Ya les (uy, (8.1) 


Therefore, the correspondence x +» X, gives a one-to-one relationship between 
these matrices and the elements of 714 ® 71g under fixed bases on 7H, and 71/3. From 
(1.23), any tensor product u © v satisfies 


(u @ v|x) = (u @ v|Xx) = Tr|v)(u|Xx = (u|X,|v). 


Now, let X’. be the same as X,, but defined ina different basis v’, ij for B, and define U = 


> es (vploj(vplor)|oj) (ur. Since |vj) = DE v,|v;)|v;) and (v;,| = = Dae! vi luz) (vi, 
we have 


Xi = Do xh (ugloj) ls) (vg = DO x (yp lyj ug lon) aes) (vil = XU. 


i,j,k i,j,kl 


That is, the definition of X, depends on the orthonormal basis of 7/3. 
Further, we have 


px © Tra [x){x| = =(rem (= Vu; i) 
=» De ae |u;) (uj (8.2) 
Us J 


=X,X"*. (8.3) 


Now, let us denote the nonzero eigenvalues of p, in (8.3) by A), ..., Ay. Then, we can 
apply the arguments given in Sect. A.2 to the matrix x!-/. Choosing sets of orthogonal 
vectors of length | asu),...,u,; and v},..., v;, we obtain x = oe Vi |uj) ®|v;). 
The right-hand side (RHS) of this equation is often called the Schmidt decomposi- 
tion, and \/); is called the Schmidt coefficient. Of course, 
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Trp |x)(x| = Dei (u'|, Tra |x)(x| = DAleiyv 


Hence, both have the same eigenvalues and entropies. However, the above is true 
only if the state on the composite system is a pure state. The number of nonzero 
Schmidt coefficients ./; is called the Schmidt rank and is equal to the ranks of 
both Trg |x)(x| and X,. 

Conversely, for a general state p“ on H4, apure state |u) (u| on H4@H satisfying 
p* = Tre |u) (u| is called the purification of p“. The quantum system H, used for 
the purification is called the reference and is denoted R. 


Lemma 8.1 For two pure states |x) (x| and|y)(y| on the composite systemH,@Hr, 
the following two conditions are equivalent. 


® The Schmidt coefficients of |x)(x\| and |y)(y| coincide, i.e., 
r= ve @ |v), = DVN ® |v}). 


® There exist unitary matrices U“, U® in A and R such that 
ry 


x= (U0? @U*) 9. (8.4) 
Furthermore, if 
Trp |x)(x| = Tre ly)(yl, (8.5) 
then 
=(1@U*)y (8.6) 


for a suitable unitary matrix U® on R. Therefore, the purification of the mixed state 
p* on Ha can be transferred by the operation of the unitary matrix on R. When the 
states Tra |x)(x| and Tra |y)(y| are not full rank, U® is need to be chosen as an 
partial isometry. 


Proof @=> by inspection. If unitary matrices U* and U® on Ha, and Ha, respec- 
tively, are chosen such that U4 (u;) = u’, and UF (vj) = v;, then (8.4) is satisfied, and 
hence O=>@. From (8.5), X,X_ = XX}. From (A.9), choosing appropriate unitary 
matrices U, and U,, we have X, = \/X,X*U, and X, = ,/X,X*U,. Therefore, 
X, = XyUSU,. Then, (8.6) can be obtained from (1.22). | 


Therefore, pure entangled states can be classified according to their Schmidt coeffi- 


cients. In particular, when all the Schmidt coefficients are equal to Ae , the state is 
called a maximally entangled state of size L. Any maximally entangled state of size 
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L may be transformed from a maximally entangled state |®;)(®z,| by local opera- 
tions. Hence, we can examine the properties of maximally entangled states of size L 
by treating a typical maximally entangled state |®;)(®;|. A maximally entangled 
state is separated from separable states as follows. 


1 
max(Pz|o|®z) = (8.7) 


where S is the set of separable states. 
Since (®;|u @ v) = (Xo, d|v) and ||X@, d||* = (vXg, X@,v) = (v 
have 


(@,|u® v)(u @ 11) < 1X0, I?ll? <>. 

Since any separable state o is written as a mixture of separable pure states, we have 
(®;|o|®z) < i: When u is equal to VLXo,), (®;|u ® v) = Ip which implies 
(8.7). 

Next, we discuss state operations consisting of local operations (LO) and classical 
communications (CC). This can be classified into three classes as Fig. 8.1: (i) only 
classical communications from A to B or from B to A are allowed (This class is 
called one-way LOCC. It is denoted by — when classical communications from 
A to B is allowed, and is denoted by < when classical communications from B to 
A is allowed.); (ii) classical communications from A to B and B to A are allowed 
(This class is called two-way LOCC and is denoted by <>); and (iii) no classical 
communications are allowed (only local quantum operations are allowed) (This class 
is denoted by @).! In terms of the Choi—Kraus representation of the TP-CP map given 
in © of Theorem 5.1, the state evolutions may be written 


K(p) = >> (Eas ® Epi) p (E4; ® E%,,)- (8.8) 


i 


Local Operations Local Operations 


Two-way (or One-way) 
Classical Communications B 


A 


Fig. 8.1 Two-way LOCC (or one-way LOCC) 


'When the measurement is employed in the class Q, it is required that Alice and Bob obtain the 
same outcome. 
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Fig. 8.2. Two partially entangled states (/eft) and one completely entangled state (right) 


<= eu 
ewe 


partially entangled states maximally entangled states 


Fig. 8.3. Entanglement dilution 


If a TP-CP map can be written in the form (8.8), it is called a separable TP-CP 
map (S-TP-CP map) [1]. A TP-CP map « is an S-TP-CP map if and only if the 
matrix K («) defined in (5.4) can be regarded as a separable state in the composite 
system (H4 ® Ha’) @ (Hp ®@ Hp), where we assume that the map E44; (E,;) is 
a map from H, (Hg) to Ha (Hz). Since the set of separable TP-CP maps forms 
a class of localized operations, we denote it by S. There are two typical types of 
LOCC operation. One is distillation, which converts a partially entangled state to 
a maximally entangled state (Fig. 8.2). The other is entanglement dilution, which 
converts a maximally entangled state to a given partially entangled state (Fig. 8.3). 
The following theorem discusses the possibility of entanglement dilution. 


Theorem 8.1 (Lo and Popescu [2]) Let the initial state of a composite system H,4 ® 
Hep be a known pure state |x)(x|. LOCC state operations consisting of two-way 
classical communications can be realized by state operations consisting of one-way 
classical communications from A to B. 


Proof For the proof of this theorem, it is sufficient to show that any final state realized 
by operation (1) can be realized by operation (2), where operations (1) and (2) are 
given as follows. In operation (1), we (i) perform a measurement in system B, (ii) 
transmit B’s measured outcome to system A, and (iii) finally apply state evolutions 
to each system. In operation (2), we (i) perform a measurement in system A, (ii) 
transmit A’s measured outcome to system B, and (iii) finally apply state evolutions 
to each system. 

From Theorem 7.2, any operation with class (1) can be described by the the state 
reduction 


I, @ MP |x) (x|I4 ©) MP, (8.9) 
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and local TP-CP maps on A and B depend on the measurement datum i. Hence, it is 
sufficient to prove that the state reduction (8.9) can be realized by a state reduction by a 
measurement on A and local TP-CP maps on A and B depending on the measurement 
datum i. 

Using (1.22) and (A.8), we have 


e ae 
I, @y MP? |Xx) = |X./M? )=|U; MP (X,)*U) 


=|Uj,/ MP V*X,V*U;) = U;V*(V;./ MP V;*) @ (V*U;)" |Xx), 


where U; and V; are unitary matrices satisfying X,. MB ’ = U;|X, MB " and 
Xx, = V|X,.|. This equation implies that the state reduction (8.9) is realized by the 
state reduction on A by the instrument {V; MB V;*}; and the local unitaries U; V;* 
and (V;*U;)" depending on the datum i on A and B, respectively. a 
Exercise 


8.1 Let x be the purification of state p on 714. Show that H(p) = H(Tra |x) (x|). 


8.2 Fidelity and Entanglement 


We can characterize the fidelity of two states on 7/4 using the purification of mixed 
states in the following way. 


Lemma 8.2 (Uhlmann [3]) Consider two mixed states p, and p2 on Ha. Let |u;) (uy | 
and |u2) (u2| be their purifications, respectively. Then, 


Ftp. pa( tel pin/pal) = max (un) (8.10) 


where the max on the RHS is with respect to the purifications of p, and pp. 


Proof First, we choose the matrix X,, according to (8.1) in the previous section 
as a matrix from the reference system 7/r to the system 7/4. (Note that the map 
ut» X, depends upon the basis of Hr). Since pj = Xy, Xi, , from (A.9) we obtain 

X,; = ./piU; choosing an appropriate unitary matrix U; on Hr. From (1.23) and 
(A.18) we have 


[(ur|u2)| = | Te Xu X%, | = | Te aU 27 Jp 
=| Tr /piv/p2U2U}| < Tr |/pr/pal: (8.11) 


which proves the > part of (8.10). The equality follows from the existence of U2U; 
satisfying the equality of (8.11). a 
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From (8.6), for an arbitrary purification x of p,, there exists a purification y of p2 
such that 


F(p1, p2) = |(xly)| = (rly), (8.12) 


where the second equation follows from choosing suitable phase factor e! in y. 
Vectors vj,...U, Satisfying >°"_, |v;)(v;| = p are called a decomposition of p. 
Using this fact, we obtain the following corollary regarding decompositions. 


Corollary 8.1 Let p, and pz be two mixed states on H,. For an arbitrary decom- 

position u,,...,u; Of pi, there exists a decomposition v1, ...vj Of ~2 such that 
1 

F (pi, p2) = doj—1 (uilvi)- 


Proof Let w,,..., w; be an orthonormal basis for the space Hr. Let x = pee uj; ® 
w;. Choose a purification y € Hy, ® He of 2 satisfying (8.12). Since wy, ..., w, 
is an orthonormal basis, there exist appropriate elements v1, ..., vy of 714 such that 
y = Di_, vj @ uj. Therefore, |(x|y)| = S_, (uilvi). a 


Corollary 8.2 (Uhlmann [3]) Let p = >, pip; for the states p; and o, and the 
probability p;. The following concavity holds: 


F'(p,0) = >) iF (pi, 9). (8.13) 


If o is a pure state, then 
F*(p, |u)(ul) = (ulplu), (8.14) 
and the equality in (8.13) holds. 


Proof The validity of (8.14) follows from the fact that F(p,|u)(u|) = Tr 
/|u) (u|p|u)(u|. Let y be the purification of o, and x; be the purification of p; satis- 
fying (xi|y) = F (pi, 7). Then, 


> PF? (i, 0) = >) pilylxi) xily) = r(x pilxi)(xil, prt) < F°(p, 0) 


completes the proof. The last inequality can be proved by the relation for the partial 
trace as follows. Two densities ; and p2 on the composite system 714 ® 71g satisfy 


F(p1, p2) = max | (uy |u2)| < max |(u}|u5)| = F (Trg pi, Tre p2), 
1,42 uu 


' ; Pe : . 
where u 1, U2 are purifications of p;,p2 and u',u, are purifications of Trg p), 
Trp P2- | 
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By applying the Jensen’s inequality to the function x +> —,/x, Corollary 8.2 yields 
that 


F(p,0) = >) piF (pi. 0). (8.15) 


A stronger statement (strong concavity of the fidelity) than (8.15) holds regarding 
the concavity of F(p, o). 


Corollary 8.3 (Nielsen and Chuang [4]) For states p; and 0; and probabilities { p;} 
and {qi}, the following concavity property holds: 


F(X pe. Dain) = >) VG F (pi, 01). (8.16) 


Proof Let x; and y; be the purifications of p; and o;, respectively, satisfying 
F(p;,0;) = (xi|y;). Consider the space spanned by the orthonormal basis {u;}. 


The purifications of >); pip; and >°; gia; are then x = DY, JPixi @ u; and y _ oH 
JG yi ® u;. Therefore, 


(Xn a) > (xly) = >) Mpg ily), 


completing the proof. a 


Monotonicity is the subject of the following corollary. 


Corollary 8.4 For an arbitrary TP-CP map « from Ha to Ha), 


F(p1, p2) < F(K(p1), 6(p2))- (8.17) 


This corollary is called the monotonicity. Further, the monotonicity (5.49), i.e., 
b(p, c) => b(K(p), K(c)), can be derived from this. 


Proof Choose the Stinespring representation (Hc, |u)(u|,U) of &, i.e., choose 
(Hc, |u)(u|, U), such that it satisfies (9) = Trac U(p ® |u) (u|)U*. Let two pure 
states u; and uw be purifications of p; and p2 on Ha ® HR maximizing the RHS of 
(8.10). Since 


K(p;) = Tra.c,r(U ® Ip) (lui) (ui| ® lu) (ul)(U ® Ir)*, 


(U @ Ir) (u; ®u) is the purification of «(p;); therefore, it satisfies |(u; @u|u2@u)| = 
|(u1|U2)| = F (pi, p2). Then, (8.17) can be obtained from (8.10). a 


Let us next examine a quantity called the entanglement fidelity, which expresses 
how much entanglement is preserved in a TP-CP map « from H, to 7H, [5]. Let R 
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be the reference system with respect to the CP map « and the mixed state p on 74. 
The entanglement fidelity is then defined as 


F.(p, k) & V/(x[K @ crix) (x)Ix). (8.18) 


where x is the purification of p. At first glance, this definition seems to depend on 
the choice of the purification x. Using the Choi—Kraus representation {F';}; of k, we 
can show that** ** [6] 


F2(p, n) = (xle @ ex(lx)(xDlx) = Do | Tr Epo. (8.19) 
Bi 


Hence, F.(p, &) is independent of the purification x and of the Choi—Kraus repre- 
sentation {£;};. From the monotonicity of the fidelity, we have 


F.(p, &) < F(p, K(p)). (8.20) 


The equality holds if pis a pure state. The entanglement fidelity satisfies the following 
properties,” which will be applied in later sections. 


® Let x’ be a TP-CP map from H, to Hg, and k be a TP-CP map from Hz, to 
Ha. When dimH, < dim7/,, given a state p on H4, there exists an isometry 
U from 7H, to 7g such that™**° [6] 


F?(p, Kok!) < F.(p, Ko Ky). (8.21) 


Whendim H, > dim 73, givenastate pon? ,, there existasubspace Hc C Ha 
with the dimension dim 7#/z and a unitary U from 7c to 7g such that 


PcpP, 
F2(p, now) < (Tr Pep) Fe (= kone), (8.22) 
Tr Pcp 


where Pc is the projection to the subspace 7{¢ and the minimum is taken with 
respect to the projection with the rank dim Hg. 
®@ If p= >; pipi, we have [6]**° 


F2(p, 6) < >) piFe (pi. k). (8.23) 


In particular, when all the »; are pure states, the following holds [5]: 


F2(p,&) <>) iF? (pi. K(pi)). (8.24) 


@ Let He» be a subspace of 714. Given areal number a such that | > a > 0, there 
exists a subspace Hc of Hg with a dimension | (1 — a) dim 7g] such that [6, 7] 


7A large part of the discussion relating to entanglement fidelity and information quantities relating 
to entanglement (to be discussed in later sections) was first done by Schumacher [5]. 
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le ae K) 


max {1 — F?(x, K(x))} < (8.25) 
xeHL 


@ Let the support of p be included in the subspace Hg of 71,4. The following then 
holds [6]: 


WIN 


(1 — F2(p, «)) < max {1 — F°(x, K(x))}. (8.26) 


The completely mixed-state pmix on 7 satisfies 


d 
—— (1 — F2(pmix, &)) = Ey,x [1 — F’(x, n(x))], (8.27) 
d+1 
where E,,,, denotes the expectation with respect to the pure state x under the 
invariant distribution j1 on H! and d is the dimension of H. 


The property © evaluates the entanglement fidelity when we replace the recovery CP 
map by a suitable isometry map. Other properties of the entanglement fidelity can 
be used for evaluating the fidelities between the input and output states for a given 
channel &. When we focus on the worst fidelity, (8.25) and (8.26) are useful. When 
we focus on the average of the fidelity, (8.24) and (8.27) are useful. In fact, the aver- 
age >); Di F*(p;, &(p;)) depends on the choice of the decomposition p = >; Pipi 
however, the entanglement fidelity does not depend on it because the entanglement 
fidelity reflects how the map « preserves the coherence of the input states. 

From the definition, for a general CP map «; and a positive real number f;, we 
have 


Y feowmer (> PD fr) ; (8.28) 


Therefore, we can define the entanglement fidelity F.(p, K) as 


F2(p, 6) = > F2(p, bw) (- i (> > =) (8.29) 


for an instrument « = {x,,} with an input and output 71, and a state p on H4. Since 
e(p,k) < 1-F 2(p, «) from (8.24), combining these properties gives (7.60) and 
(7.61). 

In fact, the purification is useful only for treating a single state. In order to analyze 
a mixed-state p on 14, we often focus on the probabilistic decomposition of p; this 
is defined as the set {(p;, p;)} satisfying 


p= >) Pipi, 
i 
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where p; is a probability distribution and p; is a state on 714. In a quantum system, the 
probabilistic decomposition is not unique for a given mixed state p. Now, we let |X) 
be a purification of p with the reference system 7/r. (Here, we choose the reference 
Hr whose dimension is equal to the rank of p.) We choose a suitable coordinate 
of 7(r so that the reduced density Tr, |X)(X| is p. When we perform a POVM 
M = {M;} on the reference 7p, the outcome i is obtained with the probability: 


pi & (X|U4 ® M)|X) = Tr M? p = Tr XM? X*. (8.30) 


The final state on 7/4 is given as 
el ! * 
pi = oy Tealla @ VM)IX)XI(Ua @ /M,) = a xm x (8.31) 


Since 


> Pipi = > XM; x" = x(=a7) X" = XX" = p, 
i i i 


any POVM M on 71p gives a probabilistic decomposition. Conversely, for any prob- 
abilistic decomposition {(p;, p;)} of p, the matrix M; = X~'p;p;(X*)~! on He 
forms a POVM as 


DM = DX pip XY! = XK)! = 1. 
i i 


Moreover, this POVM {M,;} satisfies (8.30) and (8.31). Hence, we obtain the following 
lemma. 


Lemma 8.3 Any probabilistic decomposition {(p;, pi)} of p is given bya POVM M 
on the reference system as (8.30) and (8.31). 


Indeed, using this discussion, we can characterize the TP-CP map to the envi- 
ronment based on the output state (K ®@ tr)(|Pg)(Pa|) of the given channel # as 
follows. In this case, since the initial state of the total system of the reference sys- 
tem, the output system, and the environment system is pure, its final state is also 
pure. That is, the final state of the total system is given as the purification |w) (u| of 
(K ® Lr)(|Pa)(Pa|). Since any state p can be described as d4 Trr [4 ® p' |®q) (Dq|, 
the output state with the input state p on 7, is given as 


da Tra,rUa.e ® p’)|u)(ul. (8.32) 


Exercises 


8.2. Show that 1 — (Tr |./p./o |) > d?(p, 7) using (8.10) and Exercise 3.18 for two 
mixed states p and o. 
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8.3 Show that 


F?(p,0) < Tr JpVo < F(p,0) (8.33) 


using the purifications and the monotonicity of ¢(1/2|p, o). 
8.4 Prove (8.19) noting that Tr(E; ® J)|x)(x| = Tre E;p. 


8.5 Prove property ® of the entanglement fidelity by following the steps below. 
(a) Show that there exist Choi-Kraus representations {£;}; and {A;}; of & and x’, 
respectively, such that the matrix {Tr £; A ;p};,; can be written in diagonal form with 
positive and real diagonal elements. 

(b) Using (a) and (8.19), show that there exist a matrix A and a Choi—Kraus repre- 
sentation {£;}; of & such that Tr ApA* = | and F2(p, Kok’) <|Tr E, Ap/’. 

(c) Let E be a matrix from 71g to 714. Assume that E* E < J and Tr ApA* = Trp = 
1. Take U* to be partially isometric under the polar decomposition E = U|E|. Show 
that | Tr EAp|* < Tr U|E|U*p = Tr EU“p. 

(d) Assume that dim 74 < dim 7/3. Take the polar decomposition FE; = U|E,| such 
that U* is an isometry from 7/4 to 71g. Show that Fo(p, Kok) < Fo(p,k0 Kys). 
(e) Assume that dim7/4 > dim ‘7g. Take the polar decomposition FE; = U|E;| 
such that U is an isometry from 7/g to H,4. Then, choose the subspace He C Ha 
as the range of U. So, U can be regarded as a unitary from 71g to Hc. Show that 


F2(p, 0k’) < (Tr Pcp) Fe Gas. KO Ky). 


8.6 Show ©, using (8.19) and showing the fact that the function p +> | Tr Ap|? is a 
convex function. 


8.7 Prove © by following the steps below. As the first step, determine the orthogonal 
basis x1,...,%q Of Hg inductively. Let x; be the vector argmax , <7! {1 — F(x, 
&(x))}. Given x1,...,x;, let 71; be the orthogonal complement space to the space 
spanned by x,,..., x;. Let xj, be argmax,, <7! {1 — F(x, K(x))}. Then, let 7c be 
the space spanned by xa,,..., Xdg—dce-+1, Where dc = |(1 — a) dim 71g]. Show that 
the space 7{c satisfies (8.25) using Markov’s inequality and @. 


8.8 Show (8.26) in @ by following the steps below. 

(a) Show that ee K) = Di; Pipj(uil(\ui)(uj)luj) for p = >; pilus) (uil, 
where pj > pr >... > Pa. 

(b) Let @ = (¢1,..., dg). Define u(d) & Sy; /pje ij u;. Show that F7(p, K) + 


ar fs aeleaslstiag) (u;|)\Ux) is equal to the expectation of F?(u(¢), K(u())) 
under the uniform distribution with respect to d = (¢1,..., da). 
(c) Let 6 be the RHS of (8.26). Show that 


d 
SD) Padua (lee) (ur laa) < p24, 
k=2 


d 
See ug lre(le)(w Dla) < So puprd. 
j=2 


j=2 kAj 
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(d) Show (8.26) using (a) to (c). 


8.9 Show that the equality of (8.26) in @ holds when « is a depolarizing channel 
for a quantum two-level system and p is the completely mixed-state pmix. 


8.10 Prove (8.27) following the steps below. 
(a) Prove (8.27) when « is a depolarizing channel. 
(b) Given a channel &, we choose the depolarizing channel «a, as 


Ka,\(p) = U*K(U pU*)Uv(dU), 
SU(da) 


where (dU) is the invariant distribution. Show that E,, , F 2(x, K(x)) = E, F?(y, 
Ka,\(y)) for any element y € H4, where E,,, is the expectation with respect to the 
pure state x under the invariant distribution j. 

(c) Show that F.(pmix, &) = Fe(Pmix, Ka,r)- 

(d) Prove (8.27) for any channel k. 


8.11 Verify (8.28). 


8.12 Show the following for the states p; in (8.31) when the state p is full rank and 
the reference system 7p has the same dimension as 71. 

(a) Show that the states p; in (8.31) are pure if and only if rank M@; = 1. 

(b) Show that the states p; in (8.31) are orthogonal to each other if and only if the 
POVM M = {M;,} is a PVM and commutative with (X*X)?. 


8.13 Let & be a TP-CP map from C/ to C” and x’ be a TP-CP map from C“ to C4. 
Show that F.(Omix, #/ 06) < Je 


8.14 Let p be a bipartite state on 714 ® Hg. Show that the state p is separable if and 
only if p has a purification >°; ./p; les ® Fea ® lu’) with the reference system 7/R 
such that {|u*)} is a CONS of Ha. 


8.3. Entanglement and Information Quantities 


So far, we have examined the transmission information for a classical-quantum chan- 
nel, but not the quantum version of the mutual information 7(X : Y), defined by 
(2.30) in Sect. 2.1.1. In Sect. 5.5, we defined the quantum mutual information 
I,(A: B) as 


1,(A : B) = H,(A) + H,(B) — H,(AB) = D(pllp4 @ p*) (8.34) 


with respect to a state p on 74, for quantum systems 7{4 and 7{g. We used the 
notation introduced in Sect. 5.5 for the second expression above. Confusingly, the 
transmission information J(p, W) for classical-quantum channels is also occasion- 
ally called the quantum mutual information. However, since /,(A : B) is a more 
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natural generalization of the mutual information defined in (2.30), we shall call the 
quantity /,(A : B) the quantum mutual information in this text. 

As discussed in Sect. 2.1.1, there is a precise relationship between the mutual 
information and the transmission information for classical systems. Similarly, there 
is a relationship between the classical-quantum transmission information and the 
quantum mutual information. To see this relation, let us consider a classical-quantum 
channel W with an input system ¥ and an output system 7/4. Let {u,} be the ortho- 
normal basis states of the Hilbert space 71x. Let us consider a state on the composite 
system Hx @H4 given by p = >”, px|Ux) (ux|@ Wx, where p is a probability distrib- 
ution in 4’. The quantum mutual information is then given by [,(X : A) = I(p, W). 
Therefore, this is equal to the transmission information of a classical-quantum 
channel. 

It is possible to find a connection between the transmission information and the 
quantum mutual information of a classical-quantum channel by appropriately defin- 
ing the composite system. Let us now define the transmission information of the 
quantum-quantum channel «& (which is a TP-CP map) from the quantum mutual infor- 
mation using a similar method. Here it is necessary to find the quantum-mechanical 
correlation between the input and output systems. For this purpose, similar to the 
entanglement fidelity, we consider the purification x of the state on the input 
system 7/4 because the final state of the purification x reflects how the map « pre- 
serves the coherence of the input states. The transmission information /(p, «) 
of the quantum-quantum channel « can then be defined using the quantum mutual 
information as [8] 


def 
T(p, 6) = Tuc@ipydx)(xp(R : B), (8.35) 


where R is the reference system and B is the output system. Since H(p) is equal to 
the entropy of the reference system, this can also be written as 


I(p, &) = H(K(p)) + H(p) — H(K ® tr(|x)(x]))- (8.36) 


This quantity will play an important role in Sect. 9.3. 

Let us now consider the following quantity called the coherent information, 
which expresses how much coherence is preserved through a quantum-quantum 
channel « [9]. 


Ie(p, K) = H(K(p)) — HK ® tr (x) (a1)) = —Hrcorrgdis) ep (RIB) (8.37) 


for a TP-CP map « from 7H, to 7g, a state p on Ha, and a purification x of p. 
Therefore, the coherent information is equal to the negative conditional entropy. 
Of course, in the classical case, the conditional entropy can only take either positive 
values or 0. Therefore, a negative conditional entropy indicates the existence of some 
quantum features in the system. For example, in an entanglement-breaking channel, 
the conditional entropy is nonnegative, as can be seen in (8.62). 
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The coherent information can be related to the entanglement fidelity if 


V2 — F.(p, &)) < 1/e as follows™ *'°*’ [6]: 
0<HA(p)—L(p, &) <V 20 — Fe (p, r)) (3logd — 2logy 2(1— F.(p, K))). (8.38) 


The first inequality holds without any assumption. Therefore, we can expect that the 
difference between H (p) and the coherent information [,(p, «) will express how the 
TP-CP map « preserves the coherence. This will be justified in Sect. 9.6. 

The above information quantities also satisfy the monotonicity [8, 9] 


I,(p, Kk 0K) < I(p, K), (8.39) 
I(p,k' 0K) < I(p,k), (8.40) 
I(p, ko) < I(K'(p), K). (8.41) 


If U is an isometric matrix, then the coherent information satisfies™**” [9, 10] 
I.(p, 60 Ky) = I.(U pU*, k). (8.42) 


If k = >°; p;k;, these quantities satisfy the convexity for channels [8, 10] 
Ip, 6) < > pile(p, Ki), (8.43) 


I(p, 6) <>) pil (p, Ki). (8.44) 


The transmission information satisfies the concavity for states [8] 


k k 
(>. pip.) = Do pil i). (8.45) 
i=1 i=1 


Conversely, the following reverse inequality also holds: 


k 


k 
1D. pips. ®) S Do pil (i, &) + 2logk. (8.46) 
i=1 


i=1 


Let «4 («?) be a TP-CP map from H,4 (Hg) to Ha (Hg). Let p48 be a state 
on H, @ Hg. Let p4 and p® be the partially traced state of p“\?. The transmission 
information of a quantum-quantum channel then satisfies 


Io"? n4 @ wn") s Tp? ne) +1" 8) (8.47) 


in a similar way to (4.5) for the transmission information of a classical-quantum 
channel [8]. 
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In addition to the types of information defined up until now, we may also define 
the pseudocoherent information 


I.(p, 8) = H(p) — H(n ® ee (|x) (x1)). (8.48) 


Although it is difficult to interpret the above quantity as information, it does possess 
the following useful properties [11], which will be used in Sect. 9.3. 


Ip, KK!) < I(K'(p), k), (8.49) 
i> PiPj; r) > > pile(pj, 6). (8.50) 
j j 


The first property (8.49) is the monotonicity, and can be derived immediately from 
property (8.41) and definitions. The second inequality (8.50) is the concavity with 
respect to a state. The following reverse inequality also holds, i.e., 


k k 
i(X pies.) = 2 Pileloys 6) + logk. (8.51) 
I= j= 


The derivations for (8.50) and (8.51) are rather difficult (Exercises 8.24 and 8.25). 
We can also obtain the following relationship by combining (8.49) and (8.50): 


i> piri(o).*) > >. pjle(p, #0 Ky). (8.52) 
j J 


Finally, we focus on the entropy H((K @ tr) (|x) (x|)), which is called the entropy 
exchange [5] and is denoted by H,(«, p). This is equal to the entropy of the environ- 
ment system 7/, after the state p is transmitted. Its relationship to the entanglement 
fidelity F.(p, «) is given by the quantum Fano inequality as [5] 


H.(p, k) < h(F2(p, k)) + (I — F2(p, &)) log(@? — 1), (8.53) 


where d is the dimension of 1. 
Exercises 


8.15 Let Hz be the environment system after performing a state evolution given 
by the TP-CP map « from 7{,4 to 7,4’. Let x be the purification of the state p on H4. 
Let the reference system be 7{z. Show that 


3Since the form of this inequality is similar to the Fano inequality, it is called quantum Fano 
inequality. However, it cannot be regarded as a quantum extension of the Fano inequality (2.35). 
The relationship between the two formulas is still unclear. 
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I-(p, K) = Hy (A) — Hy (E’), (8.54) 
I(p, K) = Hy (A) + Hy (A'E) — Hy (E’), 

where x’ is the final state of x. 


8.16 Show the first inequality in (8.38) by considering the Stinespring representation 
of « and (5.86) with respect to the composite system of the environment system EF 
and the reference system R. 


8.17 Show the second inequality of (8.38) by considering the purification of p and 
Fannes inequality (Theorem 5.12). 


8.18 Prove (8.39) based on the Stinespring representations of « and x’ and the strong 
subadditivity (5.83) of the von Neumann entropy. 


8.19 Prove (8.40) by following steps below. 
(a) Let |x) be a purification of p with the reference system 7/2. Show that 


I(p, K) = D(K ® tr(|x)(x|)||K(p) @ Tra |x) (x]). (8.55) 
(b) Show (8.40). 
8.20 Prove (8.43) and (8.44) using the concavity (5.88) of the conditional entropy. 


8.21 Prove (8.41) based on (8.55) and the monotonicity of the quantum relative 
entropy by considering the Stinespring representation of x’. 


8.22 Prove (8.42). 


8.23 Let x be the purification of p with respect to the reference system Hr. Let Hey, 
and +,, be the environment systems after the state evolutions KA and K8. Let x’ be 
the final state of x. Show (8.47) by following the steps below. 
(a) Show the following, using Exercise 8.15. 
I(p*, 6°) = Hy(A') + Hy (A'E4) — Hy (Ey) 
I(0, 64 @ K®) = Hy (A'B') + Hy (ABE, E®) — Hy (EE). 


(b) Show that 


1(p*, 64) + 1(p?, K°) — I(p, 64 @ *) 
=H, (A’) + Hy (BY) — Hy (A'B’) — (Hy (E4) + Hy (E5) — Hy (E4E5)) 
+ (Hy (A'E4) + Hy (BER) — Hy (A'E',B'E})) . 
(c) Prove (8.47) by combining (8.34) with (b). 


8.24 Let « be the state evolution from 7H, and from H,4’. Show (8.50) following 
the steps below. 
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(a) Let x; be the purification of p; with respect to the reference system 7p. Let 

. def 

u;} be an orthonormal basis of another system 7/z. Show that the pure state x = 
j y p 


> VPjx; @ uj on Hy @ He ®@ Hy is the purification of p 2 > j PiPi- 
(b) Show that the pinching «, of the measurement E = {|u;)(u;|} on Hp: satisfies 


D((KE @ t4,~(K® bee V(X) (x) (Ke ® La,z)(K(p) ® Tra (Ix) (x1))) 
=H (K(p)) + >> pjH(p;) — >> pj HK @ er )(xj) (xj). 
j j 


(c) Prove (8.50) by considering the monotonicity of the quantum relative entropy for 
the pinching Kz. 


8.25 Prove (8.51) using the same symbols as Exercise 8.24 by following the steps 
below. 
(a) Show that 


k 
> Pitelo;, kK) 
j=l 
=A (Ke, (Tr, (|x) (x|))) — A (Ke @ t4,p)(K © lr, p) (IX) (x). 


(b) Verify that 


k 
I.(p, &) — >) pjle(pj. 6) 


= 
=H (Tra |x)(x|) — H(Ke (Tra |x)(x|)) — HK @ tre) (x) (x1) 
+ A(KE ® lar)(K @ trp’) (x) (xI)) 
SA (KE @ la r)(K @ Ure) (1X) (x) — H(K @ trp’) (1x) (x])). (8.56) 


(c) Prove (8.51) using (5.81) and the above results. 
8.26 Prove (8.45) using (8.50) and (5.77). 
8.27 Prove (8.46) using (8.51) and (5.79). 
8.28 Show that 
max{H(p)|(u|plu) = f} =h(f) + ( — f) log(d — 1) (8.57) 

for a pure state |v) (u| on H (dim H = d). Then, prove (8.53) using this result. 
8.29 Show that 

He(Kp, Pmix) = H(p), He(Kp, leo) (e0|) = He(Kp, lei) (e1|) = A(po + ps) 


for the entropy exchange of a Pauli channel xp. 
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8.4 Entanglement and Majorization 


In this section we consider what kind of state evolutions are possible using only local 
quantum operations and classical communications given an entangled state between 
two systems. Before tackling this problem, let us first consider a partial ordering 
called majorization defined between two d-dimensional vectors a = (a;), b = (b;) 
with positive real-number components. This will be useful in the discussion that 
follows. If a and b satisfy 


k k . : 
DVays>d ib}, Usvksn), Dia} = Diy, 
j=l j=l = a 


we say that b majorizes a, which we denote as a < b. In the above, (aj) and (by) 
are the reordered versions of the elements of a and b, respectively, largest first. If 
x < y and y ~ x, we represent it as x = y. If sa = Sy -y, We write x © y. 


If sat = = sa ; > We represent it as x « y. The following theorem discusses the 
properties of this partial ordering. The relation with entanglement will be discussed 
after this theorem. 


Theorem 8.2 The following conditions for two d-dimensional vectors x = (x;) and 
y = ();) with positive real components are equivalent [12]. 


OQxxy. 

@ There exists a finite number of T-transforms T,, ..., T, such that x = T,--+-Tyy. 
A T-transform is defined according to a matrix A = (a'/) satisfying a’) = 
a2? = 1—tanda? = divi = t for some pair i, and ix, and ai = Oi, j 


otherwise, where t is a real number between 0 < t < 1. 

@ There exists a double stochastic matrix A such that x = Ay. 

@ There exists a stochastic matrix B = (b'/) such that (B/)’ ox ® y for all 
integers j. (B/)" is the column vector obtained by transposing B/. The product 


of the two vectors x and y is defined as (y 0 x); = ViXj- 


The product o satisfies the associative law. A vector e with each of its components 
equal to | satisfies eo x = x and Ley =e, 


From the concavity of the entropy, we can show that a T-transform T and a 
probability distribution satisfy H(T(p)) > H(p). Therefore, if g < p, then 


H(q) = H(p). (8.58) 


Since a double stochastic matrix Q anda probability distribution p satisfy Q(p) < p, 
we have 


H(Q(p)) = A(p), (8.59) 


from which we obtain (2.27). 
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Further, any double stochastic matrix A can be written by a distribution p on the 
permutations S; as (Ax); = Dice 5, DsXs-l(i)- Thus, when two positive-valued vectors 
x and y have decreasing ordered elements, we can show that 


(x,y) = (x, Ay). (8.60) 


Let us now consider how majorization can be defined for two density matrices 
p and o. The eigenvalues of p and o form the respective vectors with real-number 
components. Therefore, majorization can be defined with respect to these vectors. 
Letting p = >); aj|u;)(u;| and 0 = >, bj|v;)(v;|, we can write p < o ifa < Db. 
If p and o come from different Hilbert spaces, let us define p < o by adding zero 
eigenvalues to the smaller Hilbert space until the size of the spaces are identical. The 
relations p = o and p © o can be defined in a similar way. 

As this is a partial ordering, if p < p’ and p’ < o, then p < o. Since the 
entropy H(p) of a density matrix p depends only on its eigenvalues, if p < o, then 
H(p) > H(c) due to (8.58). Further, we can also show that for a unital channel « 
(e.g., pinching), 

K(p) X p. (8.61) 


Hence, we find that H(«(p)) > H(p), and therefore the first inequality in (5.82) is 
satisfied even if M is a general POVM. Thus, the following theorem can be shown 
from Theorem 8.2. 


Theorem 8.3 (Nielsen and Kempe [13]) Let p4:8 be a separable state on Ha @ Hz. 
Then, p*.8 ~< p4 = Tre p* 8. 


Combining (8.58) with this theorem, we find that H(p4'?) > H(p4) [i.e., (5.78)] if 


p*:8 is separable. This shows that any separable state p satisfies [14] 


H,(B|A) = 0. (8.62) 
The following theorem shows how two entangled states can be transformed between 


each other. 


Theorem 8.4 (Nielsen [15], Vidal [16]) Let |u)(u| and |v;)(v;| be pure states on 
Ha ® Hp. It is possible to transform the state |u)(u| into |v;)(v;| using a two-way 
LOCC with probability p; if and only if the condition 


k k 
VSD ee. Vk (8.63) 
i=1 


i=l j 


holds, where eG is the Schmidt coefficient of |v;) and JX; is the Schmidt coefficient 
of |u). This operation can be realized by performing a measurement at A and then 
performing a unitary state evolution at B dependently of the measurement outcome 
j at A. Of course, 


H(Trp |u) (ul) = >) pj H (Trg |v;)(v;))- (8.64) 
ei 
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In particular, it is possible to transform |u)(u| into |v)(v| using a two-way LOCC 
with probability 1 if and only if the condition 


Trg |u)(u| X Trg |v) (v| (8.65) 


holds. These conditions still hold even if the two-way LOCC is restricted to a one-way 
LOCC. 


Proof Step 1: Proof of the part “only if’ First, we show that (8.63) holds if it is 
possible to transform the pure state |) (u| into |v;) (v;| with probability p;. According 
to the discussion concerning instruments in Sect. 7.1, an arbitrary state evolution K 
can be regarded as an instrument given by the Choi—Kraus representation {A ;};. 
Therefore, we see that if the initial state is a pure state, the final state for each 
measurement outcome j must also be a pure state. 

Now, consider local operations and two-way communications from A to B and 
from B to A. This operation consists of repetitions of the following procedure. First, 
A performs a measurement {Aj} ; and then sends this measurement outcome j to 


B. Then, B performs a measurement {B! }; at B corresponding to A’s measurement 
outcome j. Finally, B sends his or her measurement outcome i to A. Since the final 
state after the measurement is also a pure state, the measurement at B may be written 
as A’s measurement and a unitary operation at B corresponding to A’s measurement 
outcome, according to Theorem 8.1. 

Therefore, we see that the whole operation is equivalent to performing a mea- 
surement {A ;}; at A and then performing a unitary state operation at B dependently 


of the measurement outcome j at A. By defining p, “Ty gp |u)(u|, the probability 


of obtaining the measurement outcome j is then p; “Tra jPuAj;. The final state 


is a pure state, and the partially traced state is equal to A jPuAj. Taking 


1 
Tr AjpuA% 


the unitary matrix U; giving the polar decomposition J Pu; = U,,/A jPuAj; we 
obtain 


UjAjpuAsU} = Uj,/ AjpuA’y) Aj pu AGU} = VS PuAAja/Pu- 


k 
If P is a projection with rank k and satisfies the equation Tr p, P = > x, then 


i=1 


k 
¥ > pj = >) max{Tr Aj pu A; P;| P; is a projection of rank k} 
=r j 


k 

* 1 

> > Tr Uj AjpyA;U; P = > J PuAGAj/PuP = Tr pyP = > Xj. 
j j i=l 


Therefore, we obtain (8.63). 
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Step 2: Proof of the part “if” with the deterministic case Next, let us construct the 
operation that evolves |u)(u| into |v)(v| with probability 1 when (8.65) is satisfied. 
Let the Schmidt coefficients of |w) and |v) be /\; and il Nes respectively. Let a 
stochastic matrix (b'/) satisfy Condition @ of Theorem 8.2 when x = \ = (;) and 
y = N = (;). Now, let us define an orthonormal basis {u;} and E; by 


pu = >All, Ey = a ‘\ui)( (8.66) 


Then, we have >” ; Ej = I because B = (b’-/) is a stochastic matrix. The probability 
of obtaining the measurement outcome j for the measurement {F;} is Tr p, E;. The 
final state for this measurement outcome is a pure state, and the partially traced state 
is ey Fe fo Since (B/)? o X & 2X, we have : VE j Pu Ej = 
Tr Pu E j Tr p, E j 
Trg |v)(v|. Therefore, when an appropriate unitary state evolution is applied 
dependently of the measurement outcome j, the final state will be |v)(v| with prob- 
ability 1. 


Step 3: Proof of the part “if”? with the stochastic case Finally, we construct the 
operation that evolves the pure state |v) (u| on H4 @71, into |v;) (v;| with probability 
p; when the inequality (8.63) holds. Let \’ = (X;) be a probability distribution such 
that 


k 


k 
VAS ae ve 
i=l j 


Then, the pure state |v)(v| is defined as the pure entangled state with the Schmidt 
coefficient A’ = (\;). The discussion of the deterministic case guarantees that there 
exists an LOCC operation transforming |) (u| into |v) (v| with the Schmidt coefficient 
,/x;. Therefore, it is sufficient to construct the required operation when the equality 
of (8.65) holds, ie., X, = Af. Let us define b') & pd!" /A!. Then, B = (b') is a 
stochastic matrix. Defining Ej using (8.66), we have 


= Ej Purl Ej. 


pj =TrtpuE jt Trg |v;) (uj| = 0 AP |u:) (u i|= 


1 
: Tr p, Ej 
This completes the proof. a 


Using this theorem, we obtain the following characterization. 


Lemma 8.4 (Vidal et al. [17]) Let v and u be entangled pure states with Schmidt 
coefficient ./p; and ./q; in decreasing order. Then, we have 


|(ulv) 2 s(x JPiV%) - (8.67) 
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The equality holds when vectors v and u have the Schmidt decompositions v = 


>; /Pilet) @ le?) andu = >; /qjle*) ® |e?) by the same Schmidt basis. Further, 


we have 
2 
max (u| (|v) (v|)|w) = /B;,/9') (8.68) 


Proof Let panda be the reduced density matrix on 1, of v and u. Then, there exisis 
a unitary matrix U such that 


(u|v) = Tr. /pVoU. 


Assume that o is diagonalized as 0 = pe qjle;)(e;|. Thus, 


(e;VpUIe;)|). 


| Tr /p/oU|? = Sy vatelvauten| = (“Va 


< (3X (vateivaleny/ aleslU* /aule)) 
ED 
= (X vatejlvale;))( Vaile/|U* VPUIe;)). 
Hi J 
Now, we diagonalize p as p = >”, pi| fi) (f;|. Hence, 
> VG leil/ eles) = >) MGV Billeilfidl’- 
j i,j 


Since |(e;| fi) |? is a double stochastic matrix, (8.60) implies ar, [Gj / Pil (e;\ fi) Ee 
< >); /GiJ/Pi- Thus, 


> VG leil/oles) < > SGN Bi- 
j i 
Similarly, we have 
> ValelU* YoU e}) = >. VG 
J i 


Therefore, we obtain (8.67). 

Next, we prove (8.68). From the equality condition of (8.67) and Theorem 8.4, we 
can easily verify the > part of (8.68). Assume that the LOCC operation « generates 
the state v; with probability r; from the initial pure state v. When the Schmidt 


coefficient of v; is ( p!)i. Corollary 8.2 and (8.67) imply 


380 8 Entanglement and Locality Restrictions 


J 
2 
<(>0 [Dirieiva) - 
iV 


Since Theorem 8.4 guarantees that (pj); < (>~ ary pi )i, we obtain the < part of 
(8.68). | 


2 
(ul&(lv) (v[)|u) = So rjl(ulv,)? <> “(3 Vv rv) 
i i 


Exercises 


8.30 Choose the CONSs {|u;)} and {|v;)} and the distributions p = (p;) and q = 
(qi) such that p = >°, p;|u;)(u;| and K(p) = Dai qj\v;)(v;|. Then, prove (8.61) by 
using the map p = (pi) > ((vi|K(D); pjluj)(uj|)1v:)). 


8.31 Given a pure entangled state with Schmidt coefficients ./\;, show that a max- 
imally entangled state can be produced with error probability 0 if and only if the size 
of the maximally entangled state is less than 1/ see 


8.5 Distillation of Maximally Entangled States 


In order to use the merit of entanglement, we often require maximally entangled 
states, not partially entangled states. Then, one encounters the distillation problem 
of maximally entangled states from partially entangled states. Such an operation is 
called entanglement distillation and is one of the established fields in quantum 
information theory. If the initial state is pure, it is called entanglement concentration. 
It has also been verified experimentally [18, 19]. Other experimental models have 
also been proposed by combining other protocols [20]. 

Consider the problem of creating a maximally entangled state |®;)(® | on C’ @ 
C* from a pure state |w)(u| on the composite system H,4 ® Hz. If the relation 


Trp |u)(u| < Trp |®z)(Pz| 


does not hold, it is impossible to create |®;)(®;| with probability 1. Therefore, we 
must allow some failure probability in our scheme for creating |®;) (® |. 


Theorem 8.5 ({21]) Consider the two-way LOCC operation k converting the initial 
State |u)(u| to amaximally entangled state |P;,){®,|. The optimal failure probability 
E1(K, |u)(u]) is less than f (x) = Tr(py — x1) {py — xI = 0} ifand only if 
_ 
(22 (8.69) 


~ x 


Proof Since our operation has two outcomes “success” and “failure,” the distribution 
of the outcome is described by the two-valued POVM {7,7 — T}. Hence, from 
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Theorem 7.2, our operation is given by the combination of the state evolution (7.1) 
due to a measurement {T, J — T} and the TP-CP map dependently of its outcome of 
“success” or “failure’’. The final state |v) (v| corresponding to “success” should satisfy 
p(= Trg |v)(v|) < Trg |®z,)(®z| because of Theorem 8.4. Thus, Theorem 8.1 
characterizes the minimum probability that the creation of |®;)(® | fails as 


i Tr p, 1 — T uvT X< Trg |®r)(® 
rad OH, | T Pu( | FN T pur rp |Pz)(Pz| 
= Tr p, I — T) |./ a “ux 
~ o<ret OD Ha {Tr pu( )|VPuT Pu <x}, 


where we used (A.7) to rewrite the above equation: henceforth, we abbreviate xJ to 
x. ae ow, let L be the size of the maximally entangled state to be created and the ratio 
= fut “~ be fixed to x. Since Tr p,T is the success probability, the minimum failure 
probability can be calculated from the following equation: 


{Tr py (I ~ T) ln/Pul x) Oa = = x} 


fae nen 
= au =Trs (S22) 
es ‘On 14 
a 1- Dui Sui) (uj|S|ui) < Xi, 7 
=1— DUA DE x= Tren — xu — x 2 0} = FO), (8.70) 
i:Ay<x i:dj; >Xx 


where S = ,/p,T ./p,. Therefore, if the failure probability is less than f (x), the size 
L of the maximally entangled state satisfies 


1 1 
L < max | or fo] f@)s 09] = 


In the last equality, we used the fact that f(x) is strictly monotonically increasing 
and continuous. 

Conversely, if (8.69) is true, then by choosing a projection T that attains the mini- 
mum value in (8.70) and performing a two-valued projective measurement {T, /—T} 
on the system 7/4, the outcome corresponding to T will be obtained with a probability 
1 — f(x). Since the final state u satisfies 


1 x ae 
nT pul < < 
1— f(x) Tage) = i 


we may construct a maximally entangled state of size L according to 
Theorem 8.4. a 


On the other hand, Lo and Popescu [2] characterized the optimal success probability 
P°?'(u — |®,)) for obtaining a maximally entangled state |®,) as follows: 
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L L 

P?'(y > |@,)) = max ———— 9 DJ. 8.71 

(u > |®z)) = max, ~=—— Dr; (8.71) 
i=r 


Next, we consider the problem of determining how large a maximally entangled 
state we can distill from a tensor product state p®” of a partially entangled state p on 
H, ®Hz, in the asymptotic case. Here, we formulate this problem in the mixed-state 
case as well as in the pure-state case. In such problems, we require that our operation 
Kn be optimized for a given partially entangled state p®” and hence treat the first type 
of entanglement of distillation: 


le _ 1 ; 
ES (p) = sup {im “tox L(s) lim €) (Kp, Pp) =o} (8.72) 
, {Kn }CC n Hn O8 
C4. def 1 ; 
Ez (p) = sup 4 lim —logL(K,)) lim €)(Ky, p) < 17, (8.73) 
; {kn}CC n TOO 


where C denotes the set of local operations, i.e., the notations C =>, C = @, 
C =<, C =<, and C = S imply the set of one-way (74 — 7g) LOCC oper- 
ations, only local operations, one-way (Ha <— 7g) LOCC operations, two-way 
LOCC operations, and S-TP-CP maps, respectively. Here, we denote the size of the 
maximally entangled state produced by the operation & by L(x). If p is a mixed 
state, it is extremely difficult to produce a maximally entangled state perfectly, even 
allowing some failure probability. Therefore, let us relax our conditions and aim to 
produce a state close to the desired maximally entangled state. Hence, for our oper- 


ation «’, we will evaluate the error €(k’, p) 5 (®,|k’(p)|®,) between the final 
state «’(p) and the maximally entangled state |®;)(® | of size L. When the initial 


state is a pure state v with Schmidt coefficient My; Lemma 8.4 gives the optimum 
fidelity: 


L 7\ 2 
max(;|([v)(v))|®z) = (mm >) (8.74) 
a —3| 


In the asymptotic case, we optimize the operation «/, for a given p®"; thus, we 
focus on the second type of entanglement of distillation: 


C def _ 1 
Eq2(p) = sup ) lim — log L(Kn) 
{kn} CC n 


lim €2(Kpn, p) = of : (8.75) 
noo 


* lef . 1 
ESS(9) = sup {tim toe (sn) 
, {kn}CC n 


lim €9(Kn, p) < 7 (8.76) 
noo 


The following trivial relations follow from their definitions: 


EE) 2 El, EO) = Ey Oy BO = EE), 
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for i = 1, 2. The following theorem holds under these definitions. 


Theorem 8.6 (Bennett et al. [22]) The two kinds of entanglement of distillation of 
any pure state |u)(u| in the composite system 7H, ® 7H, can be expressed by the 
reduced density py = Trp |u)(u| as 


EG; (\u) (ul) = EG} (\u)(ul) = A (pu), 


fori=1,2andC=%,>7,<,<,5S. 


The proof of this theorem will be given later, except for the case of C = @. This 
case is proved in Exercise 8.33. This theorem states that the entropy of the reduced 
density matrix p, = Trg |u)(u| gives the degree of entanglement when the state of 
the total system is a pure state. Further, as shown by Hayashi and Matsumoto [23], 
there exists an LO protocol that attains this bound without any knowledge about the 
pure state u, as long as the given state is its tensor product state. That is, there exists 
a local operation protocol (without any communication) that produces a maximally 
entangled state of size e”"") and is independent of u. This protocol is often called 
a universal concentration protocol. 

For a general mixed state p on the composite system 714 ® Hz, the entropy of 
the reduced density does not have the same implication. Consider 


E,,s(p) = min D(p\lo) (8.77) 


as its generalization for a mixed state p. This is called the entanglement of relative 
entropy. Any pure state |) (u| satisfies 


E,,s(|u) (ul) = H (Trg |u) (u)). (8.78) 


Lemma 8.5 (Vedral and Plenio [1]) The entanglement of the relative entropy satis- 
fies the monotonicity property 


E,.s(K(p)) < E,,s(p) (8.79) 


for any S-TP-CP map k. Hence, any LOCC operation satisfies the above monotonicity 
because it is an S-TP-CP map. 


Proof Let o be a separable state such that D(p||7) = E,(p), then K(c) is separable. 
From the monotonicity of the relative entropy (5.36), 


E,.s(K(p)) < D(K(p)||K()) < D(pllo) = E,,s(p), 


which gives (8.79). a 


The following theorem may be proved by using the method in the proof of Lemma 3.7. 


384 8 Entanglement and Locality Restrictions 


Theorem 8.7 (Vedral and Plenio [1]) Any mixed state p on the composite system 
Ha ® He and any separable state o satisfy 


ET} (p) < Diplo). (8.80) 
Hence, we obtain 


ES3(p) < E,s(p), (8.81) 


+ dep. E,,5(p®") 
Eq3(p) < Exs(p) = lim (8.82) 


Proof Consider an S-TP-CP map «/, on H4 ® Hg and a real number r > D(pllc). 
Since «/ (o®”) is also separable, (8.7) implies that (® x |«/,(0®")|®ew) < e7"”. From 
T = |® yur) (Ben | > O we have 


T= (6, )*( Ben) (Bear) = CRI") = (61,)* Dene) (Beer |) 
=(K))"(L = |® ear) (Ben) > 0, 


where («/,)* is the dual map of «/, (see @ of Theorem 5.1). Moreover, 


(K,)" (Pear) (Dew |) = 0, 
Tr 0" (Kj) (Der) (Dew |) = (Ben |, (F2")|Benr) < e"”, (8.83) 


Since the matrix (K/,)*(|®.0)(®r|) satisfies the condition for the test 0 < (xj,)* 
(|Denr) (Pen |) < I, the inequality (3.138) in Sect. 3.8 yields 


—(s)—sr 


(Deore, (p2")| Dem) = Tr p2"(K),)*(|er) (Ber) eM, (8.84) 


for s < 0, where ¢(s) = #(s|pllc) = log Tr p!~*o*. Using arguments similar to 
those used for the Proof of Lemma 3.7, we have (yu |K/, (0®”)|®e»r) > 0. We thus 
obtain (8.80). Applying the same arguments to p®*, we have 


E,.s(p®*) 


E43(0) < —— 


Combining this relation with Lemma A.1 in Appendix, we obtain (8.82). a 
Conversely, the following lemma holds for a pure state. 


Lemma 8.6 Any pure state |u) (u| on the composite system H, ® He» satisfies 
Eq (\u)(ul) = A (pu). (8.85) 


Proof When R < H(p,), according to Theorem 8.5, there exists an operation Kk, 
satisfying 
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1= Tr(p®” _ eR) po" _ ek > 0) 
L(kn) = | J 


E1(Kn, lu) (ul®") = Tr(p2" — e~"*) {p2" — e"* > O}.- (8.87) 


u u 


(8.86) 


Define w(s) = log Tr p!~’. The failure probability €)(«,, |u)(u|®") can then be 
calculated as 


E1(Rn, |) (ul ®") < Tr po" {o2" — e-"* > O} < Tr (p2")'* e™® 


= er s)t5R). 


for s > 0. Since R < H(p,), we can show that ¢(K,) — 0 using arguments similar 
to those given in Sect. 2.1. Based on this relation, we can show that 


_ Tr(p®” _ ent) foe” _ ek > o} ae 1, 


which proves that limy-, oo ett) = R for L(k,). Hence, we obtain (8.85). | 


Proofs of Theorem 8.6 and (8.78) Let |u) = >°; ./pilu; @ uj) and o = >), pi|u;i ® 
u,)(u; @ u;|. Since u; and uv‘ are orthogonal, 


E,,s(\u) (ul) < Du) (ulllo) = H (py). 
Combining (8.81) and (8.85), we obtain 
E,,s(\u) (ul) < H(Trp |u) (ul) < Ez (\u)(ul) < E73 (u)(ul) < E,,s(\u) (ul). 


This proves Theorem 8.6 and (8.78). | 


When we treat the optimum outcome case, the following value is important: 


® ® 
£C,() max _max (PEa(PdIhe) 


K={K,JEC Ww Tr Kw (p) 
It can easily be checked that 
®,|(A @ B)p(A @ B)*|® 
EC, (p) = max (PHA @ BIOLA B BY") aes 
, A,B Tr p(A*A @ B*B) 


for C =><«,, S. This value is called conclusive teleportation fidelity and was intro- 
duced by Horodecki et al. [24]; it describes the relation between this value and the 
conclusive teleportation. 


Exercises 


8.32 Define the entanglement of exact distillation ES Cp) and the asymptotic 
entanglement of exact distillation See (p) 
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def 


def 
Ep) = = 


lim , EX .(9) = max {log L(K)| €2(«, p) = O}, (8.89) 


noo 


EE 0%") 
n 


and show [21, 25] 
EZ; (\u) (ul) = — log Ay. 


(This bound can be attained by a tournamentlike method ford = 2, but such a method 
is known to be impossible ford > 2 [26, 27].) 


8.33 Letu = >¢, /\iu4 @ uP? be a Schmidt decomposition, and define the POVM 
M*? = {M"}oer, as MX" = Diers Ju*)(uX| for X = A, B. Apply the mea- 


surements M“:" and M®:" to the both sides when the state is |u)®”". Show that the 
resultant state with the measurement outcome q is a maximally entangled state with 
the size |T;'|. Using this protocol, show E¥ (lu) (ul) > H(p,). This protocol is 
called a Procrustean method [22]. 


8.34 Define the generalized Bell states us - = (4 ® XUZ UG and the gen- 


. Z def A,B\, A,B A,B def 
eralized Bell diagonal state pgei,, = >; j PijlUj; )(u;; |, where upg = 


> us ® u®. Show that E,,s(ppen,p) < logd — H(p). For the definition of Xz 
and Zs, see Example 5.8. 


8.35 Define the quantity 


le . 1 : 1 
EC (r|p) = sup | lim —— loge; (ky, p)| lim —— log L(K,) > r (8.90) 
{Kn}CC n "on 


for i = 1, 2 and any class C. Then, show [21, 25] 


Eq (|e) (ul) = max s(Hi4s(Pu) — 7) for C =>, —, >. 
8.36 Define the quantity 
Cx def. — 1 . 1 
Eq; (rip) = inf) lim —— log(1 — €;(kn, p))| lim —— log L(kn) > ry¢ (8.91) 
, {Kn}CC n n 


for i = 1, 2 and any class C. Then, show [21, 25] 


t(r = _; (Pu)) 


92 
1+t eo 


E**(r||u) (ul) > max 
ar ||u) (ul) = ne 


8.37 Show the following equation: 


E*® (\u) (ul) = —log A}. (8.93) 
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(a) Show 


1 


max Tr |®q)(®g|K(o) < — (8.94) 
o€S, d 


(b) Show 


max Tr |1) (uo = Pee (8.95) 
TEvVs 


(c) Check the following relation: 
max{d| Tr «(\u)(u|)|®a) (Bal = 1} 
(a) , = 
< max{min (Tr |®j)(®4lx(o))1| Tr (a) (ul)! ®a)(al = 1} 


2 max{min (Tr K*(|Oa)(Pal)o)~*| Tr lu) (ulx* (1a) (Pal) = 1} 


o€S, 
(c 


) 
< max {min(Tr To) '| Tr |w)(u|T = 1} 
0<T<I oeéS, 


© min(Tr|u) (ula)! 2 orem eo) 


where S, is the set of separable states. 
(d) Prove (8.93). 


8.6 Dilution of Maximally Entangled States 


In the previous section, we considered the problem of producing a maximally entan- 
gled state from the tensor product of a particular entangled state. In this section, we 
examine the converse problem, i.e., to produce a tensor product state of a particular 
entangled state from a maximally entangled state in the composite system 4 ® 73. 
In this book, we call this problem entanglement dilution even if the required state 
is mixed while historically it has been called this only for the pure-state case. 

For an analysis of the mixed state p on the composite system H,4 ® He, we 
define the entanglement of formation £ ;(p) for a state p in the composite system 
Ha ® He based on the probabilistic decomposition {(p;, p;)} of p [28]: 


def : 
E,(p) = min D1 pi (Tre pi). (8.97) 


Since this minimum value is attained when all p; are pure, this minimization can be 
replaced by the minimization for probabilistic decompositions by pure states. From 
the above definition, a state p; on 74, ® 7p, and a state p2 on Ha, ® Ha, satisfy 
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E;(p1) + Ef(p2) = Ep (p1 ® pr). (8.98) 


Theorem 8.8 (Vidal [16]) Perform an operation corresponding to the S-TP-CP map 
« with respect to a maximally entangled state |P;,)(® | of size L (the initial state). 
The fidelity between the final state and the target pure state |x)(x| on H, ® Hz, then 
satisfies 


max F(«(|®z){ Pz), [x){x]) 


=max F(«(|Pz)(PzI), |x) (x1) = v PO, L), (8.99) 


where P(u, L) is defined using the Schmidt coefficients ./; of \u) as follows: 
L 
Pu,L) 2 > a. (8.100) 
i=l 


Note the similarity between P(u, L) and P(p, L) given in Sect. 2.1.4. Furthermore, 
the fidelity between the final state and a general mixed state p on Ha ® He satisfies 


F(K((®z)(®z|), p) = iP(x;, L), 8.101 
max F(K(|®z)(Pz|), p) = max, [dP (xi, L) (8.101) 


where {(p;, x;)} is the probabilistic decomposition of p. 


Using (2.50), we obtain 
1- PO, le)<e i, (8.102) 


where 0 <5 < 1 and py © Trp |x) (x1. 


Proof The proof only considers the case of a pure state |x) (x|. Let {E4; ® Eg i}; 
be the Choi—Kraus representation of the S-TP-CP map «. Then 


K(|®z)(®z|) = > (Ea; ® Epi) |®z)(®z| (Eas ® Epi) 


Taking the partial trace inside of the summation >", on the RHS, we have 


Trg (Ea; ® Epi) |®z)(®z| (Ea; ® Ep.) 
=(E4iXo, Ep )(EaiXo, Ep)" 


from (1.22). Its rank is less than L. Let y be a pure state on the composite system 
such that the rank of the reduced density of y is equal to L. Thus, by proving that 
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l(ylx)| < PO, D), (8.103) 


the proof can be completed. To this end, we define the pure state | y;) (y;| as 
2 
gilyi) (vil = (Ea; ® Epi) |®1)(®1| (Ea ® Epi), 


where q; is a normalized constant. Then, 


F*(K(1®z)(®z)), x) (x) = >) gi FU yi) (vil, Xx) < Do i PG, L). 


We can use this relation to show that the fidelity does not exceed ./ P(x, L). Ina proof 


of (8.103), it is sufficient to show that F(p,,0) < / P(x, L) for px = Trp |x)(x| 


and a density matrix o of rank L. First, let o = 4 Dilv;)(v;| and let P be the 


projection to the range of o. Since the rank of P is L, choosing an appropriate unitary 
matrix U, we obtain the following relation: 


i=1 


Tr |/pxVo| = Tr./p,J/oU = mvn( rie) 


L L L 
=>) VPilvilU Vole) < | vi | DL vil Vpelvi) ? (8.104) 
i=l i=1 i=l 


L 


L 
= | il aU" li) (VU Soli) S| DL vil Pe Pelvs) (8.105) 


i=l i=l 


=,/Tr Pp, < / P(x, L). (8.106) 


This evaluation can be checked as follows. Inequality (8.104) follows from the 
Schwarz inequality. Inequality (8.105) follows from U*|v;)(v;|U < J. The final 
inequality (8.106) can be derived from the fact that P is a projection of rank L. Thus, 
we obtain 


max F(K([®z)(®z)), |x)(x|) < VPC, L). 


Conversely, we can verify the existence of an S-TP-CP map with a fidelity of 
/ P(x, L) by the following argument. There exists a pure state y satisfying the equal- 
ity in (8.103); this can be confirmed by considering the conditions for equality in the 
above inequalities. Since the pure state |y)(y| satisfies Trg |®;)(®;_| < Trg |y) (yl, 
according to Theorem 8.4, there exists a one-way LOCC that produces the pure state 
|y)(y| from the maximally entangled state |®;)(®7;|, i.e., it attains the RHS of (8.99). 
This proves the existence of an S-TP-CP map with a fidelity of P(x, L). a 


Next, let us consider how large a maximally entangled state is required for producing 
n tensor products of the entangled state p. In order to examine its asymptotic case, we 
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focus on the S-TP-CP map &,, to produce the state p®”. The following entanglement 
of cost E“ (p) expresses the asymptotic conversion rate 


lef . as 1 
E°(p) inf | lim — log L, 
{kin}CC n 


lim F(p®", kn ([®z,)(®z,|)) = 17, (8.107) 
n—-0oo 


which is the subject of the following theorem. 


Theorem 8.9 (Bennett et al. [22], Hayden et al. [29]) For any state p on the com- 
posite system H, ® He, 


LO) jp EO (8.108) 
n 


EC (p) = lim 
n—> oo n n 

forC =>,<,<,5S. 

above theorem implies that the entanglement cost EC(p) has the same value for 

C=—>,<,<, S. Hence, we denote it by E,(p). 


Proof of Pure-State Case. We prove Theorem 8.9 by analyzing the pure state |x) (x| 
in greater detail and by noting that 7)(s) = W(s| Trg |x) (x|). For any R > H(x), we 
can calculate how fast the quantity (1 — Optimal fidelity) approaches 0 according to 


1 1 
lim ——log(1 — V’P(p", e"®)) = lim ——log(1 — P(p",e"®)) 
n>oo on no Nn 

w(s)—sR 
0<s<1 1—s 


’ 


where we used (2.188) for P(x®”, e”*) and 1 — /1 —€ = Se. If R < H(x), the 
fidelity approaches zero for any LOCC (or separable) operation. The speed of this 
approach is 


1 { —sR 
in ied Peres = min (8.109) 
n>o Nn 


2 s<0 l-s 


where we used (2.190). From these inequalities, we have Ee (p) = H(p,) for C = 
—>,<,<, S using the relationship between H(p) and (s) given in Sect. 2.1.4. 
That is, we may also derive Theorem 8.9 in the pure-state case. a 


Since the additivity relation 


E,(p®" 
a = E;(p) (8.110) 


holds in the pure-state case, the entanglement of cost has a simple expression: 


ES (p) = Er(p), (8.111) 
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forC =>, +, <, S. However, it is not known whether this formula holds for mixed 
states, except in a few cases, which will be treated later. Certainly, this problem is 
closely connected with other open problems, as will be discussed in Sect. 9.2. 

Similar to (8.89), we define the entanglement of exact cost HE kp) and the 
asymptotic entanglement of exact cost i (p) 


| (p) © lim 
n—->oo 


Ep) def 
So EC Ap) = min {log L| F(p, K(\®z)(Pz))) = 1} 


(8.112) 


and the logarithm of the Schmidt rank for a mixed state p: 


E;,(p) min max log rank p;, (8.113) 
{(pi,pi)} t:pi>O 


where >"; pipi = p and p; is a pure state. Due to Theorem 8.4, any operation k 
satisfying F(p, K(|®z)(®z,|)) = 1 makes a decomposition >"; p;p; = p such that 
p; 1S a pure state and the rank of p; is less than L. So, we have Ee, (p) = Es,-(p). 
Hence, 


Esy ee 
ESS*(p) = lim sca) for C =>,<—,<,S. (8.114) 
7 noo n 


Any pure state |u) (u| satisfies the additivity E,,(\u)(u|®") = nE,,(|u) (u|). However, 
the quantity E;,(o) with a mixed state p does not necessarily satisfy the additivity. 
Moreover, such that E,,(p) = Es, (p®) [30]. 


Exercises 


8.38 Let H4 = Hg = C?. Choose a state p on Ha ® He such that the support 
belongs to {v @ u—u @ v\u, v € C*}. Show that E¢(p) = log2 [31]. 


8.39 Show that FE satisfies the monotonicity for a two-way LOCC k. 


E;(p) = Ep(n(p)). (8.115) 


8.40 Show that EC *(\u) (u|) > E(Trg |u)(u|) for any pure state |w)(u| by defining 
E“*(p) ina similar way to Theorem 8.7. This argument can be regarded as the strong 
converse version of Theorem 8.9. 


8.7 Unified Approach to Distillation and Dilution 


In this section, we derive the converse parts of distillation and dilution based on the 
following unified method. In this method, for a class of local operations C, we focus 
on the entanglement measure E°(p) of a state p € S(H4 @ Hg) that satisfies the 
following axioms. 
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E1 (Normalization) E°(p) = logd when p is a maximally entangled state of 
size d. 

E2C (Monotonicity) E°(K(p)) < E°(p) holds for any local operation « in 
class C. 

E3 (Continuity) | When any two states p, and o, of system 1, satisfy ||Pn — on 
ll, — 0, the convergence [EE {te — 0 holds. 


alias) 


E4 (Convergence) The quantity converges as n —> 00. 


Based on the above conditions only, we can prove the following theorem. 


Theorem 8.10 (Donald et al. [32]) When the quantity E©(p) satisfies the above 
conditions, 


Ce jSn 
EC, (p) < EC*(p) (“ lim ae") <EC(p). (8.116) 
noo n 


Proof Let k, bea local operation kK, in class C from (H4)®”" ® (Hg)®” to C# @ C& 
such that* 


II |®a,)(®a,| — kn (p®” dla > 9, (8.117) 


where 108 dn > ES, (p). From Conditions E1 and E3 and (8.117), we have 


E (kin (p®")) 
+ — Ej 2(p) 
n 
E° (kn (p®")) — E©(|®q,)(® log dy 
ES (ten 92") = EU Paarl! , {los iio] 0 
n n ; 
Hence, Condition E2C guarantees that 
EC @n EC " @n 
Si = ) = #E,(). (8.118) 
n—->Oo n n—->Ooo n ‘ 


We obtain the first inequality. 
Next, we choose a local operation «, in class C from C% @ C” to (H,4)®" ® 
(Hp)®" such that 


lIkn (Iz, )(Pa, 1) — p?" llr > 0, 


ee CK Cyan 
where 8a, — E(p). Similarly, we can show \2 (en Pan) (an) Ee) | = 


. EC (kn (|®a,)(® log dy 
Since (kn (| an) dn) < oe 


, we obtain 


4Tf the operation K, in C has a larger output system than C” @ C“, there exists an operation «/, in C 
with the output system C“ @C% such that || |®g,)(®a,|—Kn(p®”) lt = I ®a,) (Pa, |—#,, (0® Ih. 
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lim 2 E° (a), 
noo 


ES (p®") 
n 


For example, the entanglement of formation E¢(p) satisfies Conditions E1, E2<> 
(Exercise 8.39), E3 (Exercise 8.42), and E4 (Lemma A.1). Using this fact, Theorem 
8.10 yields an alternative proof of the converse part of Theorem 8.9. The entangle- 
ment of relative entropy E,.5(p) also satisfies Conditions E1, E2s (Lemma 8.5), and 
E4 (Lemma A.1). Further, Donald and Horodecki [33] showed Condition E3 for 
entanglement of relative entropy F,.s5(). Similarly, by using this fact, Theorem 8.10 
yields an alternative proof of (8.82) of Theorem 8.7. 
In addition, the maximum of the negative conditional entropy 


def 
Ey, (0) = max — Hyip)(A|B) (8.119) 


satisfies Conditions El, E2C, E3 (Exercise 8.44), and E4 (Lemma A.1) for C = 
>, <, S. Thus, 


Ci ,@n 
Full < Bo (p) (8.120) 


Ef2(p) < lim 


for C =>, <, S. Conversely, as will be proved in Sect. 9.6, the opposite inequality 
(Hashing inequality) 


E72(p) = —H,(A|B) (8.121) 


holds, i.e., there exists a sequence of one-way LOCC operations producing an approx- 

imate maximally entangled state of an approximate size of e~"%‘4!9), Perform- 

ing the local operation «,, in class C, we can prepare the state «,(p®"). Applying 

this sequence of one-way LOCC operations to the state «,,(p®"), we can show that 
ic n 

ES,(p) = ES,(p®") > —Hy,(p2(A|B), which implies ES,(p) > Eno) Thus, 

we obtain 


EC @n 
ES,(p) = lim En(o) (8.122) 
% noo n 
Therefore, since the relation E,,5(p) < E¢(p) holds **', we obtain 
EC @n . E, @n 
ES,() = tim “2 << £6) < tim SP 
‘ noo n ; noo n 
E : Qn 
efi es EC (p). (8.123) 
noo n 


We also have the following relations without the limiting forms: 


m 


—H,(A|B) < ES(p) < E$(p) < ESS (p) < Exs(p) < Ey(p). (8.124) 
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The above quantities are the same in the pure-state case. However, the equalities do 
not necessarily hold in the mixed-state case. 

Indeed, the expression of E©(p) can be slightly simplified as follows. Consider 
a TP-CP « with the Choi—Kraus representation {F;}. This operation is realized by 
the following process. First, we perform the measurement M = {M; la and obtain 
the outcome i with probability p; = PY (i), where F; = U;./M;. Next, we perform 
the isometry matrix U; dependently of i. All outcomes are sent to system B. Finally, 
we take the partial trace with respect to the measurement outcome on 71g. Hence, 
Inequality (5.88) yields 


Ap) (A|B) S Ag: , U;./ Mi pM Us (A|B) 


=- 2 PiHy, saioivy (ALB): 
Since operation « is separable at least, the unitary U; has the form UA @ U?. Hence, 
Hu, (inlet (A|B) = A Sito (A|B). 
Therefore, 


ES (p) = Ban 2 Pill minim (AlB) = Se Ary (p) (A|BE), (8.125) 


where p= = pit (i), and 71, is the space spanned by {ef } because 


Ru (p) = @ le? )(eF'|. (8.126) 


Mp JM 
>; es Pi 


As another measure of entanglement, Christandl and Winter [34] introduced 
squashed entanglement: 


def 1 
Esq(p) = in| 5 5 Donn c(A: BIE)| pa.p.e: Tre Pap. = P (8.127) 


It satisfies Conditions E1, E2< (see [34]), E3 (Exercise 8.43), and E4 and the 
additivity (Exercise 8.45) 


Esq (p) + Esq (a) = Esq (p @a). (8.128) 
Hence, Theorem 8.10 implies that 


E® (p®" E Qn 
E?,(p) = lim En 0") < Esq(p) < lim ee Et (p). 
, noo n noo n 


Now, we give a theorem to calculate Eg.2(p). 
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Theorem 8.11 For a given state p on Ha ® Hap, there exists a TP-CP map k from 
system HH to system the reference HR such that the equation 


A(Tra,r(Mi ® Ip,r)|x)(x|) 
=Trap(M; ® Tp.r)|x)(x\) (8.129) 


holds for any POVM M = {Mj} on Ha, where |x) is a purification of p with the 
reference system Hr. Then, the quantity E,? (p) is calculated as 


—H,(A|B) = E,7 (p) = = (8.130) 


Ee) 
n 

Further, the condition for Theorem 8.11 holds in the following case. The sys- 
tem 7{g can be decomposed as a composite system 7g; © 71,2 such that the 
system 7/g,; is unitarily equivalent to 7{r by a unitary U. Moreover, the state 
Tr744,H,2 |x) (x| commutes the projection to the symmetric subspace of Hp ®@ Hg 1, 
which is spanned by the set {U(x) ® y+ U(y) @ x|x, y € 7,1}. In this case, any 
state o, on the symmetric subspace and any state p, on the antisymmetric subspace 
satisfy U Trr p;U* = Trp. p; for i = s, a. Note that the antisymmetric subspace is 
spanned by the set {U(x) ®@ y — U(y) @ x|x, y € 7g,1}. Hence, the map x satis- 
fying (8.129) is given as the map p +> U Trg.2 pU* with the state p on the system 
He — Hea ® Hep.2- 


Proof of Theorem 8.11 For any one-way (H4 — Hs) LOCC operation x’, the local 
operation on 7/4 can be described by the Choi—Kraus representation { F;} on 74, and 
the operation on 7g can be described by a set of TP-CP maps {x;} on 7g. Let |x) 
be the purification of p with the reference 7/r. Then, the measurement outcome i is 
obtained with the probability p; = Tr F; (Trg p) F*, and the resultant states with the 
measurement outcome i on 71g and 7/z are the states - Tra. r(M; ® Ip. r)|x)(x| and 


7 Tra,p(M; @ Iz, r)|x) (x|, respectively, where M; = F* F;. Since the monotonicity 
of transmission information (Exercise 5.23) for the TP-CP map & given in (8.129) 
implies that 


L i 


“(En Tra,r(Mi;i ® Ip, r)|x) i) - > iH (Tra, R(M; ® Iz,r)|x)(x|) 


(Xn Tr4,p(Mj ® Ip,r)|x) si) - >) pi (Tra.e(M; ® Ip,x)|x) (x1). 


L i 


Since >); pi Tra,r(Mi @Ip,r)|x)(x|) = Tra,r |x)(x| and >°; p; Tra,p(Mi @1p,r)|x) 
(x| = Tra,p |x)(x|, we have 
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— H,(A|B) 


-( pi Tra,r(Mi ® Ines!) —H (x pi Tra.e(M; ® Invi) 


>>) piH (Tra,e(Mi ® Ip,2)1x) (x1) — >) pi (Tra.2 (Mi ® Ip,0)1x) (x1) 


L L 


=— >) piH,,(AIB). 
Further, from inequality (5.111) 


— Hyp) (AB) = — >> Pi Hesenyon (ALB) < — >- pi Hp, (ALB). 


Hence, we obtain — A,(,)(A|B) < —H,(A|B). 
Further, in this case, the tensor product state p®” also satisfies this condition. 
Hence, using (8.122), we obtain —H,(A|B) = 422 — E7,(p). | 


As generalizations of E 3 (p) and E,.5(p), we define 


EE sim (P) = max — Hcp) (ALB) (8.131) 
EO, sim (P) = max — Hy ,¢p)(AlB) (8.132) 
Ex4sis(p) = min Diys(plla) (8.133) 
Er4sis(p) = min Dy,.(ollo). (8.134) 


Then, we can show the following lemma. 


Lemma 8.7 When C is >, =, or S, we have 


Ets sm(P) < E1ssis(p) for s € [-1, 1] (8.135) 

EC, sim(P) < E14sis(p) for s € I-5; 00). (8.136) 

Proof Any separable state ¢ = >); pilui){ui| @ |vi)(v;| with |Juil] = llvil| = 1 
satisfies 

o <> pila ® vi) (vi| = Ls @ on. (8.137) 


t 


Since &(c) is separable for a € S and & € S, we have 
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— Hl gsjngy (AIB) = min Diys(K(0) La ® 28) 


< min Di+5(K(p)||1A ®@ K(7)B) 

(a), () 

< min Dj45(K(p)||K(o)) < min Dj45(pllo) = E.isis(p), 
oeS oes 


where (a) follows from (8.137) and (e) of Exercise 5.25, and (b) follows from (a) 
of Exercise 5.25. Thus, we obtain (8.135). Similarly, we can show (8.136). a 


Exercises 


8.41 Show that E,.5(p) < Ey(p) using the joint convexity of the quantum relative 
entropy. 


8.42 Show that the entanglement of formation E /(p) satisfies Condition E3 (con- 
tinuity) (Nielsen [35]) following the steps below. 

(a) Let states p, and a, on the bipartite system H 4, ® 7g,y satisfy || Pn —On||1 > 0. 
Here, we assume that dim 7/4, = dim 7z,,. Show that there exists a decomposition 
Pr = >; Pn,ilXn,i) (Xn,i | such that maa 

| >; Pn,i H (11 |Xn,i) (Xn,il) — E¢(On)| — 0. Here, choose the purifications x, and 
Yn Of p, and o, based on Lemma 8.2. 

(b) Prove Condition E3 (continuity). 


8.43 Show that the squashed entanglement E,, (p) satisfies Condition E3 following 
the steps below. 

(a) Let states p, and o,, on the bipartite system H 4, ® 71g,y satisfy || Pn —On|l|1 > 0. 
Here, we assume that dim7(,4, = dim Hg, _,. Show that there exists an extension 
p28” of p, such that +|5J,s#e(A : BIE) — Esq(on)| > 0 using (5.106). 

(b) Show Condition E3 (continuity). 


8.44 Show that E°(p) satisfies Condition E3 (continuity) for C =>, <, S using 


m 


(8.125), (5.104), and the monotonicity of d,. 


8.45 Show the additivity of squashed entanglement (8.128) using chain rule (5.109) 
for quantum conditional mutual information. 


8.46 Let |x)(x| be a purification of p with the reference system Hp. Assume that 
the state Trg |x) (x| is separable between 7,4 and 7p. Prove the equation 


E;(p) + Ep(o) = Es(p @0) (8.138) 


by using a similar discussion to the proof of (8.144) [31]. 


8.47 Show that the inequality Ee alp) < limpoo Err) holds even though we 
replace Condition E3 in Theorem 8.10 by the following condition: 


E3’ (Weak lower continuity) When the sequence of states p, on C“ @ C% satisfies 
‘ EC On * 1 dy 
IlPn — |Dy,) (Ba, lh — 0, then lim,-, 4 Pe = limp +00 <Ee 
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8.48 Show the following relations 


t(r — Ejais(p)) 
Eo > ee 8.139 
121 |p) = er ee ( ) 


t(r — Ejsis(p)) 
l+t . 


EX3(r|p) = max (8.140) 
. 


8.8 Maximally Correlated State 


Next, we introduce an important class of entangled states. When a state on the 
composite system 4 ® 7, has the form 


pa = > 0%, ;|uf @ uP) (uA @ uF, (8.141) 
ij 


where (q;, ;) is a matrix and {uA }({uP }) is an orthonormal basis of 14 (7g), itis called 
a maximally correlated state. A state p is maximally correlated if and only if there 
exist CONSs {uA} and {uP} of 7,4 and 7, such that the outcome of the measurement 
{|uA) (uA|} coincide with those of {|u?)(u?|}. Evidently, any pure entangled state 
belongs to this class. Under this class, many entanglement measures introduced 
above can be calculated as follows. 

To calculate these quantities, we consider the separable state 


a = > ajiluh @ uP)(ud @ uP |, (8.142) 


which satisfies 
E,.5 (Pa) < D(pellFa) = H(oq) — A (po) = —H, (A|B). 
Hence, we obtain 
_ Cc _ ec = RCH = 
—H,,(A|B) = EL (pa) = ESy(pa) = ES} (pa) = Er,s(Pa): (8.143) 


for C =>, +, <, S. Regarding the entanglement formation, as is shown latter, the 
equation 


Es(Pa) + Ep(o) = Ep (pa @ 0) (8.144) 


holds for any maximally correlated state p, on 74,1 ® 7g, and any state o on 
Ha,2 ® He,2. Hence, any maximally correlated state p, satisfies E ¢(pa) = E- (pa). 
Indeed, many researchers [31, 36] conjectured that the equation (8.144) holds for 
arbitrary two states: The conjecture is called the additivity of entanglement formation. 
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This relation can be generalized to the superadditivity of entanglement formation [37] 
as follows. 


E;(Tt2 p) + E¢ (Tr p) < Ey(p). (8.145) 


While the superadditivity of entanglement formation trivially derives the additivity 
of entanglement formation, as shown in Sect. 9.2, the converse relation holds [36]. 
However, as shown in Sect. 8.13, there is a counterexample for superadditivity of 
entanglement formation. Hence, the additivity of entanglement formation does not 
hold for general two states. However, the additivity of entanglement formation for 
the tensor product case remains unsolved, i.e., it is still open whether the equation 
E ¢(p®") = nE f(p) holds in general. 

One might consider that F'¢(p) equals E,.5(p) for a maximally correlated state p 
because this relation holds for pure states. However, this relation does not hold in 
general, as disproved in C? ® C? by (8.321) and (8.322) in Sect. 8.16.1. 

A state p is maximally correlated if and only if it has a probabilistic decomposition 
of pure states (p;, |x;)) such that all |x;) have the common Schmidt bases on H, and 
7H. Its necessary and sufficient condition was obtained by Hiroshima and Hayashi 
[38]. For example, any mixture of two maximally entangled states is maximally 
correlated. 

We also have another characterization of maximally correlated states. 


Lemma 8.8 Let |x) be a pure state on the composite system Hs ® Hp ® He. Then, 
the following conditions are equivalent. 


@ p48 = Trp |x) (x| is maximally correlated. 


@ pb “Tr, |x) (x| has the following form 


ph = S) pilu? ® xf) (u? ® xj", (8.146) 


where {u?} is a CONS of Hp, but {x*} is not necessarily a CONS of Hr. 


Using this property of maximally correlated states, we can show that any maxi- 
mally correlated state satisfies the condition for Theorem 8.1 1****. Hence, we obtain 
another proof for a part of (8.143) with C =—. 


Proof of (8.144) Let pq be a maximally correlated state on H4, ® Hg, and o be a 
state on 74, ® 7z,. Then, Let y; and yp be the purifications of p, and o with the 
reference systems 7/r,) and 7/r2, respectively. Then, any probabilistic decomposi- 
tion of p, ® a is given by a POVM M = {M;} on Hei ® Herz (Lemma 8.3). Due 
to (8.146) in Lemma 8.8, the matrix Tra, |u“')(u"| ® Z,,r,|y1)(yi] can be written 


as Dj |u? ; a (uP, a |. Fora POVM {M;} on 7p and each j, we define the distri- 


L 
bution Q/ := Tre M;(|x;")(x;"| @ Tra,,, |y2) (yal) and the state of on Ha, ® Ha, 
by Tre(M; ® Ia,,2,)|X7", y2)(x;", y2] = Q/o7. From the definition of maximally 
correlated states, we have the expression of the conditional state on system 7g as 
follows: Using the notation o}, ; := Tr, 0), we have 
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Trr,a(M; ® T4,3)|¥1 @ y2)(y1 ® yal 


=Trrar > (Mi ® |ui"') (ui ® Tas,8)|y1 ® ya) (91 ® yal 
j 


=Trr.a, > (Mi ® Ta,,5,,8,)(pilup' x") uP", xf" | ® Lye, ) (yal) 
j 
= pjlu?')(uP| ® (Tre.ay(Mj ® Lay,a,)124", ya) (x7", yal) 
J 


=> pjlu?')(w?"| ® (Tre.ay(Mj ® LAy,B,)104", yo) (xF", yal) 
Jj 


=> pO) apie? | Oats (8.147) 
J 


Now, we define the distribution P; and the conditional distribution P;,; as P; (i) Py 
(jli) = pj; Q}. 
The strong concavity of von Neumann entropy (5.110) yields that 


ALS Py GlluF) (uF | @ of, 
J 


SHLD Py Glilu® ul) +o Pn Gl H,)- (8.148) 
j j 


Hence, the probabilistic decomposition by the POVM M yields the following average 
entropy on 71/3; ® 73,2: 


DY PAH [Par ildlu?) uF] @ of, 
i j 


> DU POH [DI Parle) PD + DY POP GAC, 
U wi 


ij 


Using the state p; defined by p;(i)p; = Trr Mi @ I4,,2,|¥1) (y1|@ Tray, |y2) (yal, 
we have the decompositions 


pa= > Pipi, 0 = >. Pili) Palio! (8.149) 


ij 


and the inequalities 
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LOH Ps =D, PDEs (o) = Er (pa) (8.150) 


T poensoae® > I Pr@PiuGEs@]) = Ep). (8.151) 


ij 


Therefore, we obtain (8.144). a 


Evidently, (8.147) is essential to the proof of (8.144) in the following sense [31]. In 
fact, if the vectors us eae ud, in (8.142) are not orthogonal, the state p, does not 
satisfy (8.147) in general. As shown in Sect. 9.2, such a state is essentially related to 
entanglement-breaking channels. 


Proof of Lemma 8.8 Assume Condition @. Perform the POVM {|uA) ue |}. The final 
state on Hg ® He isa pure state |u? @x;*)(u? @x* |. In this case, {uP} isa CONS of 
7g. Since any measurement on 71, gives a probabilistic decomposition on 71, @74R 
(Lemma 8.3), we have (8.146). 

Next, assume Condition @. There exists a CONS {u a of 7,4 such that 


xy= > Jul @up x}. (8.152) 
Thus, when we perform the measurements {|u/)(u/|} and {|u?)(u? |}, we obtain the 
same outcome. That is, p42 is maximally correlated. a 
Further, as a generalization of a part of (8.143), we have the following lemma. 


Lemma 8.9 The maximally correlated state pq given in (8.141) satisfies the equality 
in (8.135) and (8.136), i.e., 


Et ysim(Pa) = — At ysip,(AlB) = Diss (Palloa) 
=D145(al|D(a)) = Eissis (pa) (8.153) 

ES, sim (Pa) = — At, 1p, (AB) = Dy, (Palla) 
=D,,,(all|D(a)) = Ei4sis (Pa), (8.154) 


where D(q) is the diagonal matrix with the diagonal elements a;,; and Og is given 
in (8.142). In particular, when pq is pure, i.e., @ is pure, we have 


E14s18(00) = Hi-sip,(A), E1+sis(Po) = H 1 \),(A). (8.155) 
Proof Due to Lemma 8.7, it is sufficient to show 


Msn (A|B) = Di4s(Palloa) = Di+s(al| D(@)) (8.156) 
Bede, (A|B) = Dj ,,(Palléa) = Di4,(al|D(@)). (8.157) 
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The second equations of (8.156) and (8.157) follow from the definitions of p,, and 
04. Now, we consider the projection P to the subspace spanned by the CONS {|u4 @ 
uP)};. Since PI4@(oq)gP+(U—P)I4@(oa)8U —P) = 148 (0), PPaP = Pas 
and PI, ® (0)gP = Gq, (b) of Exercise 5.25 implies that 


Di4s(PallLa ® (Oa)a) = Di+s (Palla). (8.158) 
Since (Pa) 8 = (a) B 
Diss(PallZa ® (Ga)8) = — Aj, 5), (AIB)- (8.159) 


Hence, we obtain (8.156). Similarly, we can show (8.157). 
Finally, we shown (8.155) when a is pure. In this case, we have 


1 1 
D145(al| D(a)) = — log Tra!*’ D(a) = — log Tr aD(a)~* 
Ss S 
1 
a log Tr D(a)'* = A_s(a) = FA ~s\p, (A). 
and 


a) l : 
= log Tr(aD(a)~ ta)! *$ 


Dj,;(al|[D(@)) = . 


s Ss 


_! 1-7 ES oe 1-;~)l+s 
=- log Tr((Tr D(a) )a)™* = — log Tr((Tr D(a) ™) a) 
Ss Ss 


s 


1 
=— log(Tr D(a)! tm) = Hs (a) = Hi ,,(A), 
‘ ; 


T+s 
where (a) follows from Exercise 3.12. Hence, we obtain (8.155). | 
Exercises 


8.49 Consider the maximally correlated state p, given in (8.141). Employing the 
notation in (8.146) of Lemma 8.8, we define the TP-CP map « from the state on 
He, to the state He by K(p) & >; (uP plu?) |x) (xf |. Show that the TP-CP map « 
satisfies the condition (8.129). 


8.50 Show that the maximally correlated state p, satisfies 


t(r — Di41(Palloa) 
E>*(r|pa) > 8.160 
a2("|Pa) 2 fa ee ( ) 


where a (r|Pa) is defined in (8.91) and oy is given in (8.142). Inequality (8.160) 
is a generalization of (8.92). 
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8.9 Dilution with Zero-Rate Communication 


In this section, we treat entanglement dilution with small communication costs. When 
d is the Schmidt number of the initial pure entangled state |u)(u|, from the proof 
of the part “if” of Theorem 8.4, we can convert the pure entangled state |u)(u| to 
the other pure entangled state |v)(v| satisfying the condition (8.63) by using the 
measurement whose outcomes are at most d elements. That is, the required amount 
of classical communication is at most log, d bits. In this case, we call the number 
of measurement outcomes the size of classical communication. Now, we consider 
the size of classical communication in the asymptotic situation. For this analysis, we 
focus on the entanglement of cost with zero-rate communication: 


E>" (e) = inf | lim eee 
c (p) {kn}CC | n>00 logCCkn) _. g 


n 


def | aaa log Ly 


limy-+oo F(p®", Kn(|®z,) (Pz, |)) =1 
(8.161) 


where CC(x) is the size of classical communication. --+ denotes the set of LOCCs 
with zero-rate classical communications. This value is calculated in the pure-state 
case as follows. 


Lemma 8.10 (Lo and Popescu [39]) 
E, (x) (x|) = A (Tre |x)(x|). 


Proof To prove this, we first assume that there exist two probability spaces §2,, and 
§2' and a distribution pj, on S2/ for a given distribution p on 92 and € > 0 such that 


] Q! 1 2, 
HOA Ge ep SO te ee, ti | oy, 


noo n n= 00 n 


(8.162) 


Indeed, the pure state with the Schmidt coefficients \/ (Dmix,2, pit can be real- 


ized from the maximally entangled state with the size |2,,| x |92/| by classical 
communication with a size of at most |2/|. Therefore, if the state |w)(u| has the 
Schmidt coefficients ./p;, its n-fold tensor product |u)(u|®”" can be asymptotically 
realized from the maximally entangled state with asymptotically zero-rate classical 
communication. 

It is sufficient to prove (8.162) by replacing a distribution p’, on £2), by a positive 
measure p’, on 2). Letting J, < J) be integers, we construct a measure p, on 
92" as follows. For a type q € T;, satisfying /, < |T7'| < I, we choose a subset 


~ def 
jog C i; such that [qa \ i | < 1,. We define a measure p, = p" 1g, where 


, def 


— ni 
2), — qT SIT Sl, fh 7 Then, 
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d(Pn, p") < > l,e" dX. qu log pu 
qETy ln S|TP ISL, 
+ DY pap + DL pd) (8.163) 


qeTuIT?<ly geTy:|T2|>U, 


In this case, the measure p, has the form pmix,e, X p, With |2,| = J, and |2/| = 


IT", When we choose J, = e"#)-9 and I) = e"#)+9, limy soo zai = 2€ 
and limy-so0 weal = H(p) — «. From the discussion in Sect. 2.4.1, the right-hand 
side (RHS) of (8.163) goes to 0. | 


Next, we focus on the mixed state Tr4, 3, |x) (x|. Using this theorem, we can check 
that 


E,? (Tr ay,B, |x)(x|) < A (Trg |x) (x1). 


Hence, defining the entanglement of purification for a state p on H4, ® 7H2,: 


E,(p) - min oe |x) (x|), (8.164) 


X:TrAy,By |X) (X=, 
we obtain 


E,(p® 
By) < im 2 
noo n 


Conversely, the opposite inequality follows from generalizing Theorem 8.10. 
Hence, we have the following theorem. 


Theorem 8.12 (Terhal et al. [40]) 


E,(p2" 
E>-*() = lim Eke) (8.165) 
noo n 


To generalize Theorem 8.10, we prepare the following condition. 


E2’ (Weak monotonicity) Let « be an operation containing quantum communi- 
cation with size d. Then, 


E(K(p)) < E(p) + logd. 


Lemma 8.11 When the quantity E(p) satisfies Conditions E1, E2’, E3, and E4, 


mn ( tn 2 


—>0o n 


) < E.(p). (8.166) 


This inequality holds even if we replace the one-way classical communication in the 
definition of E.~* (p) with quantum communication. 
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Proof of Theorem 8.12 In fact, the entanglement of purification E,, (p) satisfies Con- 
ditions E1, E2@ (Exercise 8.51), E2’ (Exercise 8.51), E3 (Exercise 8.52), and E4. 
Hence, (8.165) holds. | 


Further, using relation (8.165), we can slightly modify Lemma 8.11. For this purpose, 
we introduce the following condition. 


El’ (Strong normalization) 
E(\u) (ul) = H (Trg |u) (ul). 


Lemma 8.12. When the quantity E(p) satisfies Conditions E1’ and E24, 


E(p) < E,(p). (8.167) 
Hence, 
ie E @n E @n 
E~(p) (# lim =) Pg ee cca ee E> (p). (8.168) 
n—oo n noo n 


Proof of Lemma 8.11 We choose a local operation K, with one-way classical com- 
munication with a size /, such that 


log /, 
Irn(Ia, (D4, — pl + 0, —=* > 0, 
where 108 4 — E~-*(p). Condition E3 guarantees that | Eo Uin(I Pay MP » a | 
— 0. Combining Conditions E2@ and E2’, we have 
E(Ky (| Du, ) (Dy, 1) < log dy, - log bas 
Therefore, we obtain (8.166). a 


Proof of Lemma 8.12 Let |x) be a purification of p attaining the minimum of 
HA (Trg |x)(x|). Hence, from Conditions E1’ and E29, 


E(p) < E(\x)(x|) = A (Trg |x)(x|) = Ep(p). 


a 
The entanglement of purification E,,(p) is bounded as follows. 
Lemma 8.13 (Terhal et al. [40]) 
Ep(p) (= min {H (Trp [x)(x1)| Tra,.n, |x) (x = p}) 
< min{H(p"'), H(p"")}, (8.169) 


where dim H,, < da,dz,, dimHg, < (d4,dp,)*, and p*“' = Trg, p, p®! = Tra, p. 
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Proof Let |x) be a purification of p and Hs, be its reference space. Then, any 
purification is given by an isometry U from H,, @ Ha, to Hp, ® Ha, as 


U ® Ia,,2, (Ix) (x| ® po)U ® Ih,,2,)", 
where po is a pure state on H,4,. Hence, 


Ep(p) = min H(K(Tra, |x) (x1), 


where « is a TP-CP map from H’, to 7/z,. Since the minimum value is attained with 
an extremal point, from Corollary 5.2 we can restrict H,, toa (d4, dp, )?-dimensional 
space. Further, we can restrict 714, to a d4,dg,-dimensional space. 

In addition, substituting K = 1, we obtain E,(p) < H (p“'). Similarly, the inequal- 
ity E,(p) < H(p*') holds. a 


Now, we apply the above discussion to the evaluation of the quantum mutual infor- 
mation J,(A : B). The quantity 1?) satisfies Conditions El’ and E2’ (Exercise 


wee Ip (A:B) | Tp (AB Ip, @p)(A:B 
5.42) and the additivity aS Es nl’ _ asia ) Hence, 


[(A:B 


Exercises 


8.51 Show that the entanglement of purification E,(p) satisfies Conditions E20 
and E2’. 


8.52 Show that the entanglement of purification E, (p) satisfies Condition E3 based 
on the discussions in the proof of Lemma 8.13 and Exercise 8.42. 


8.10 Discord 


Next, we consider non-classical correlation. For this purpose, we prepare the quantity 
C/'~ 8 (p) as a measure of the classical correlation as Fig. 8.4. 


Fig. 8.4 Discord Entangled state 


ue S, 


| 
M 


i P; 
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CE 8 (p) = max H(p) — DPM) H (0?) 


= H(p®) — Hes ey eup(p) (BIE), (8.170) 
def 1 
p? = =~ Tra(M; ® Ip)p, 
Pi) 
p 


where M = {M;} isa POVM on the system H, [41] and pe is the resultant state on 
He with the measurement outcome /. 
The quantity C re (p) has another form as (Exercise 8.56) 


A>B = By _ M 7: B 
CAPO) = yp yfMOX yy, HOP) — DIP OHHH). (8.171) 


For the derivation of this equation, see (8.126). It is easily checked that it satisfies 
Conditions E1’ (Exercise 8.54) and E20 (Exercise 8.54). Thus, 


lim 


Cin) 
n—->0o n 


< E,"(p). (8.172) 


In fact, C4 (p) satisfies Condition E3 (continuity) (Exercise 8.55). 
For example, when state p is a maximally correlated state, C oe (p) is calcu- 
lated as 


Cy @=A@) 
because there exists a POVM M such that H (p?) = 0. Moreover, 
Ci? (p®") = nH (p*). 
Hence, from (8.169), 
E,"“(p=H#@)=C;""O. 
Further, we have an interesting characterization of C/~(p) when |x) is a purifi- 
cation of p with the reference system 7/. Since any probabilistic decomposition of 


the state on 71g ® 7p can be given by a POVM on 7, (Lemma 8.3 and (8.31)), the 
relation 


Ci (p) = H(p®) — Er (p®*) (8.173) 


holds, where p?-® = Tra |x)(x| [42]. This quantity is different from the usual 


entanglement measure in representing the amount of classical correlation because the 
separable state )“_, 4 |uA, uP) (uA, uP | with the CONSs {uw} and {uP} has the same 
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amount of C oma (e) as the maximally entangled state. This issue will be revisited in 
Sect. 9.5. 
Indeed, when H, = 71,2, we can define the flip operator(swapping operator) 


Fas Fu®v) = v ®@ u. Operator F has the form F = P, — Pz, where P;(P,) is 
the projection to the symmetric space +1, (the antisymmetric space #/,), which is 
spanned by {u ®v+ uv @ u} ({u @ v — v @ y}). The flip operator F satisfies the 
following property 


TrA@ BF = > (u;,uj|A® BF|u;,uj) = > (uj, uj|A ® Blu;, ui) 
ij i,j 


= >) (ui|Alu;)(uj|Blui) = > “(ui|A Blu;) = Tr AB. (8.174) 


ij 
As is shown latter, when the support supp(p) is contained by 7, or 74g, the 
equation 
Ep(p) = H(p*) = H(p*) (8.175) 
holds as follows [43]. Since p®” also satisfies this condition, we have 
Ez*(p) = Ep(p) = H(p*) = H(p"). (8.176) 
Proof of (8.175) Let |u) be a purification of p with the reference systems Az and Bo. 


Then, Flu)(u|F* = |u) (ul. Ayy)(uj(B1B2) = Ary) r (Bi B2) = Aju) (ArB2) = 
Fu) (uj(B1 Az). Hence, from inequality (5.100) we have 


An) (uj (Ay A2) + Au) (uj (Bi Bo) = Ayuy(uj(A1A2) + Aju) (uj (Bi A2) 
> Aju) (u\(Ar) + Ayu) uj (81). 


Since Hjy)(uj)(A1A2) = Ajuy(uj (Bi Bz) and Ajy)uj(A1) = Aja) (wj(B1), we obtain 
Aw) (u\(A1A2) = Au) (Ar), 


which implies (8.175). a 


Now, in order to measure non-classical correlation, we introduce the discord as the 
discrepancy between the two measures C ale (p) and I,(A : B) 


D(BIA)p := 1,(A : B) — C278 (p) (8.177) 
because [,(A : B) expresses the whole correlation and C 48 (p) expresses only 
the classical correlation. Using the monotonicity of quantum relative entropy for the 


measurement, we can show the non-negativity of the discord D(B|A)>: 


D(B\A), = 0. (8.178) 
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The discord D(B|A), has another form (Exercise 8.58) 


D(B\A), = Hyer(B|R) + Ep (p?*), (8.179) 


Hence, (8.178) can be also checked from (8.124). 
Consider the case where p has the following specific separable form: 


p= >> pilus)(uAl @ p?, (8.180) 


where {u/} is a CONS on 74. In this case, the optimal POVM M on H, is |uA) (uA | 
because H(p®) — 3°, p;H(p?) = I,(A : B). Thus, 


C27? (p) = (0) — >) piH(p?), (8.181) 


which implies the equality of (8.178). In fact, the converse argument holds without 
assuming the form (8.180). 


Lemma 8.14 The equality of (8.178) holds if and only if p has the specific separable 
form of (8.180). 


Due to this lemma, even though the state p“¥ is separable, if it does not have the form 
(8.180), the separable state p48 has non-zero non-classical correlation D(B|A) p> 0. 


Proof As is above shown, a state with the form of (8.180) satisfies the equality in 
(8.178). Hence, we will prove (8.180) from the equality in (8.178). Due to (8.171), 
from the equality in (8.178), there exists a POVM M = {M;} such that rank M; = 1 


H(p*) — Do PM@A (0?) = 1,(A: B). (8.182) 


We denote M; as a;|v;)(v;|, where ||v;|| = 1. We can assume that p“ > 0 without 
loss of generality. Now, we focus on the entanglement-breaking channel ky from 
system H., to system C* fora POVM M = {M;}‘_, on Ha: 


ku (p) = (Tr pM;)|u;) (us|, 


where {u;} is a CONS of C*. Then, the left-hand side (LHS) of (8.182) is equal to 
Tm@iz)(p(A : B), Le., 


D((km ® tz)(p)||(Km ® ce)(p* @ p®)) = D(plip* @ p?), 


where p“ = Trg p, p? = Tra p. Applying Theorem 5.8, we have 
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pa OM cae, (Mi ® tn) 


pe (wile Wedd ere 9M, @ Ie) 
(uilp* lui) 


-¥ |u:)(u;| ® (Tra p(M; ® Ip). (8.183) 


Now, we denote the resulting state on the system 7/g with the measurement outcome 
j by p?. Then, 


(Tr p(M; ® Ip))p} = Tra(M; ® Iz)p 
=Tra(Mj ® Iz) >° |ui)(uil ® (Tra p(M; @ Tz)) 


= >" (ui|Mj|ui)(Tra p(Mi ® Ts), (8.184) 


which implies 5°; (u;|Mj|u;) Tr,(M; ® Ig) = Trp(M; ® Ig). Thus, P(i|j) := 
TAM Bla) (u;|Mj|u;) gives a conditional distribution. Then, we have 


Tr p(M;@Iz) 
BY) = A(p8) — Tra p(M; ® Tz) 
I,(A: B) = H(p") Ze @ Ip)H (ane ) 
=H (p") — >) D2 (ui|Mjlui) Tr p(M; ® Ip) H ( 


ij 


(Tra p(M; ® 2) 
Tr p(M; ® Iz) 


(Tr, p(M; ® Ip)) 


_ - . oF 
=H(p”) Ltr @ nD Pepe ( Tr p(M; © Tp) 


) . (8.185) 


The relation (8.184) yields 


> (ui|Mj\ui) (Tra p(Mi ® “2) 


A>B = By _ ; 
Cy (p) = A(p”) | Tr p(M; ® Ip) 


T M; @1I 
=H) Depa, 9 4 (D pan Gee oie) (8.186) 
j i ; 
Combining (8.185) and (8.186), we have 
D(B\A)>, 

. . Tra p(M; @ Tz)) 

= Tr p(M; ® Ip)H P a 

2 r p(Mj ® I) (x ace ) 


(Tr, p(M; ® Tp)) 
ne ne (ae) 
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T M @I 
=O pM; Oe 


Jt 


oat (Tr, p(M; ® Ip)) 
a2 Fae (Game). (8.187) 
Since 
. (Tra p(M; ® Ip)) Sea (Tr, p(M; ® Iz)) 
n(Xea Tr p(M; ® Tn) ) = 2 PGE ( Tr p(Mi ® Tn) 
(8.188) 


and D(B|A), = 0, the equality in (8.188) holds for all 7. Now, we decompose the 
set of indexes i to the collection of disjoint subsets S, such that the relation 


(Tra p(M; ® Iz)) = (Tra p(M; ® Ip)) (8.189) 
Tr p(M; @ Ig) Tr p(M; @ Iz) , 


holds fori # j € S, and the relation (8.189) holds fori € S, and j ¢ Sy. By 


; (Tra p(Mj®@Iz)) . B : Z 
denoting the state Tra M8 Tr) for j € S, by p;, the state p is written as 


p= >) Dd Tra p(M; ® Ie)|ui)(uil ® pa = DY) Palas ® pa 
a icSq a 

where P4(a) := > j¢5 Trp(M; ® Ip) and p4 := eS) Nu;) (u;|. Since the 

equality in (8.188) implies (8.189) for P(i|j) 4 0, the support of p4 is orthogonal to 

that of p4 for a 4 a’. Considering the spectral decomposition of each p, we obtain 

the form (8.180). a 


iESq 


Now, we calculate E~~*(p) and E,,(p) in another case by treating C oe (p). We 
assume that p has the form (8.180) and p? is pure. Then, we have™***” 


1,(A: B) = H(p®) = E,(p). (8.190) 


Further, for its purification |x) with the reference system Hp, Exercise 8.14 guar- 


antees that the state p?® © Tra |x) (x] is separable. So, (8.173) yields that 


Ci (p) = H(p®). Therefore, 
Ce =A’) =1,(A* 8) = 2,0) = He": (8.191) 


In this case, we also have I,a(A : B) = nH(p®). Hence, (8.190), (8.165), and 
(8.169) imply 
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L,on(A: B E,(p®" 
H(p®) = lim ae < E,?(p) = lim Ee) < H(p®). (8.192) 
n—>oo n noo n 
Hence, 
Ez" (p) = C77? (p) = H(p*). (8.193) 
Exercises 


8.53 Show that 
C478 (9) + CPF (0) = CPF (9 @ 0) (8.194) 


for a separable state p using Exercise 8.46. 

8.54 Show that the quantity C/'~ 8 (p) satisfies Conditions E1’ and E29. 
8.55 Show that the quantity C'~ 8 (p) satisfies Condition E3 using (8.173). 
8.56 Show the equation (8.171). 


8.57 Prove (8.190) for a separable state p of the form (8.180) with rank pp = 1 
following the steps below. 
(a) Let |X))((X| be a pure entangled state on H4@ 7g and M = {M;} be arank-one 


PVM on 74. Show that the state p = > M; @ XM; X* satisfies H (Tra |X))((X|) = 
I,(A: B). 

(b) Show A(Tr, p) = 1,(A: B). 

(c) Show E,,(p) = A (Try p). 


8.58 Show the equation (8.179) by using (8.173). 


8.11 State Generation from Shared Randomness 


In this section, we address the state generation from minimum shared random num- 
bers in an asymptotic formulation. If the desired state » is nonseparable between 
Ha and 71g, it is impossible to generate state p only from shared random numbers. 
Hence, we treat a separable state: 


p= >) pip} ® pP. (8.195) 


In particular, when the conditions 


[of ppl=0 Vi, j, (8.196) 
[o?. pe] =0 Vi, j (8.197) 
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Fig. 8.5 State generation 
from shared randomness 


state = — 


hold, the problem is essentially classical. In this problem, our opemlon is described 
by the size of shared random numbers M and the local states of and of ie | 


of the shared random number i = 1,..., M, i.e., we focus on our operation o® 


{oA @ 08}, for approximating state p. Its petommaice’ is characterized by the size 


|D| = M and the quality of the approximation la we pee, of @ af - p| ‘ Hence, 


the minimum size of shared random numbers is asymptotically characterized by 
C-(p) (Fig. 8.5)° 


|Pnl 
def . =— i ; A Bo g@n|, _ 
C.(p) = Bn ( 7 08 |Pnl | Him 1S Dh @ Pn i — P | =0;, 
(8.198) 
where ®, = {of ® of ;}. Since a shared random number with size M can be 
simulated by a maximally entangled state with size M, we have 
Co(p) = Ez? (p): (8.199) 
For this analysis, we define the quantities 
Tre PABE = Ps Ioy,,(A: BIE) <6 
C i ") o inf AB:E PABE ; 8.200 
(p = | Doane ( ) PABE = oa pep Q |uX)(u= | ( ) 


def 


C(p, 5) © inf (Ipyye(AB : E)|Tre pape =P: Toaye(A: BIE) <5}, (8.201) 


where {u“} is a CONS on Hg. From the definitions, the inequality 
C(p, 5) = C(p, 6) (8.202) 
Exe. 8.61 


holds. In particular, we can prove 


C(p) = C(p, 0) = C(p, 0). (8.203) 


5The subscript c denotes “common randomness.” 
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Further, this quantity satisfies the following properties. 


CM1 (Monotonicity) Operations «4 and kg on H, and 7, satisfy the monoto- 
nicity™ 8.59 


C(p, 6) = C((Ka @ KB)(p), 5), Cp, 6) = Ca @ KB)(p), 5). (8.204) 
CM2 (Additivity) The quantity Cp) satisfies the additivity™**®: 
C(p @ a) = C(p) + C(a). (8.205) 


CM3 (Continuity) The former quantity C(p, 5) satisfies two kinds of continuity, 
i.e., if p, is separable and p, — p, then 


im. C(pn) = C(p), (8.206) 
lim C(p, 6) = C(p, 0). (8.207) 


In particular, the convergence in (8.207) is locally uniform concerning p. 
CM4 (Asymptotic weak-lower-continuity) When ||p, — p®”"||; — 0, the ineq- 
uality 


C(Pn 
tim £&) , C¢py (8.208) 
n>o (nN 
holds. 
CM5 C(p) satisfies 
C(p) => 1,(A: B) (8.209) 


because 


Tjase(AB: E) > Ipsse(A: E) = Ipase(A: E) + Ipsse(A: BIE) 
=[,ase(A : BE) > [jase (A : B) 


for any extension p“** of p satisfying I,sse(A : B|E) = 0. 
CM6 When condition (8.196) holds, C(p) is upper bounded as 


C(p) < H,(A). (8.210) 
This can be checked by substituting 7,4 into 7/; in the definition of Cc (p, 0). 
Using the quantity C(p), we can characterize C,.(p) as follows. 


Theorem 8.13 When p is separable, then 


C-(p) = C(p). (8.211) 
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Hence, from (8.199), 
E>? (p) < C(p). (8.212) 


Further, there exists an example of separable states p such that conditions (8.196) 
and (8.197) hold and C,(p) > E,~” (p) [44]. 
Proof Since the direct part follows from the discussion in Sect. 9.4, its proof will be 
given in Sect. 9.4. Hence, we only prove the converse part here. Now, we choose the 
state p, = Bq Vint 7H; @o2; ® |uP)(uP| such that || Tre pn — p®"||1 > 0. Then, 
we have 


log |@,| > 1,,(AB: E) > CCE pn). (8.213) 
Hence, combining (8.208), we obtain 


1 
lim — log |®n| = C(p). 
n 


noo 


Proof of (8.207) We first characterize the quantity C(p) as follows. Since the state 
p“®)F is restricted to a separable state between AB and E, the state p\“¥)¥ is 
given by a probabilistic decomposition (p; p;) of p. Now, recall that any probabilistic 
decomposition of p on H4 ®@ Hz is given by POVM M on the reference system 
as (8.30) and (8.31). In order to satisfy the condition [,ss«(A : B|E) = 0, any 
component p; has a tensor product form. Hence, 


C(p) = inf [Ipqpe(AB - E) Foape (A+ BIE) = o} 


where 


pape SO Tea(/M; @ Dix) (x1Q/Mi @ 1) @ |uF) (ul). 


Therefore, from Lemma A.12 we can restrict the range of the above infimum to the 
POVM M with at most 2(dim H4,,)* elements. Since the set of POVMs with at 
most 2(dim H.4,z)* elements is compact, the above infimum can be replaced by the 
maximum. Further, we define 


C(p, 6) = inf [Hoare (AB - E) ogee (A: BIE) = | (8.214) 


Since [ape (A : B|E) is written as 


def 
Ipase(A: BIE) = >> pil,(Mj), pi = Te(Mj ® D)|x)(x\, 


def 
T,(M;) = Treg (Mente men (A : B), 


Pi 
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from Lemma A.12, we can restrict the range of the infimum in (8.214) to the POVMs 
M satisfying that |M| < 2(dim H,, By. Since the set of POVMs with at most 
2(dim 74, pz)” elements is compact, from Lemma A.4, we have 


lim C(p, 0) = C(p). (8.215) 


Indeed, the above convergence is locally uniform for p. From (5.106), the functions 
Tape (AB: E) and Tape (A: B|E) satisfy 


[Tape (AB :E)- T,ase(AB : E)| < 5elogdim 7H, 2 + no(e) + 2h(€), 
[pase (A : BIE) — T,aze(A : BJE)| < 8elogdim Hy, pz + 6h(E), 


where € = ||o—p]|,. Hence, the local uniformality follows by checking the discussion 
in the proof of Lemma A.4. a 


Proof of (8.206) Now, we prove (8.206). Let |x,) (|x)) be a purification of p, (p) 
such that |(x|x,)| = F(p, Pn). From (3.48), 


Il 1x) | = [Xn) nl lh > 0. (8.216) 
We choose a POVM M, with at most 2(dim H A.B)” elements such that 
Taze (AB: E) = C(pn), [ane (A: B|E) =0. 
From (8.216), (5.105), and (5.106), 
def 
On = Tape (A : BIE) > 0, 
,, = |Ipsze (AB : E) — Ippe (AB : E)| = 0. 
Hence, 
C(pn) + 6, = C(p, bn). 
From (8.215) we obtain the inequality lim,_.. C(pn) = C(p). 
Conversely, we choose a POVM M with at most 2(dim H 4, pz) elements such 
that 
Tape (AB : E)=C(p), Tape (A : BIE) =0. 


From (8.216), (5.105), and (5.106), 
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én = I,ane(A : BIE) > 0, 


/ 


En 


Tees 


lef 

|T,aze (AB :E)—- Tape (AB : E)| > 0. 
Hence, 
C(p) + €, = C(Pn, €n). 


Since the convergence of (8.215) is locally uniform, we obtain the opposite inequality 
limy+oo C(Pn) < C(p). a 


Proof of (8.208) Let pp?” be a state satisfying Trp p43" = p,, Ipase(A: B|E) = 0, 
and I,ase(AB : E) = C(pp). From (5.99), the state pf?” satisfies Hpase(AB|E) < 
>: Have (A; Bj|E). Hence, 
C(n) = Hpase (AB) — Ayase (ABE) 
> Have (AB) — > H,ase (A; Bil E) 


=H,,(AB) — >” Hyase (Ai Bj) + >) (Hp, (Ai Bi) — Hpgse (Ai Bi E)) 


> H,,(AB) — >) Hp,(AiBi) + >) C(pn,i), 


where ,,,; is the reduced density matrix on A; B;. The final inequality follows from 
the definition of C(p,;). Since p,,; approaches p, properties (8.206) and (5.92) yield 


tim ©») > C(p). 
n 


noo 


Exercises 
8.59 Prove inequality (8.204). 


8.60 Prove (8.205) following the steps below. 

(a) Assume that an extension p“8" of p4'?' @p*? satisfies [ase (Aj Az : By B2|E) = 
0. Show that J,ss2(A, : By|E) = I,ase (Az : B|A,B, £) = 0 using (5.109). 

(b) Prove (8.205) using (a). 


8.61 Prove (8.203) following the steps below. 

(a) Assume that an extension p“** of p satisfies I,s#e(A : B|E) = 0. Show that 
Tcy@cap)(p'8#)(A : B|E) = 0 for any PVM M on He. 

(b) Prove (8.203) using (a). 
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8.12 Positive Partial Transpose (PPT) Operations 


In this section, we treat the class of positive partial transpose (PPT) maps (operations) 
as a wider class of local operations than the class of S-TP-CP maps. Remember that 
TA is defined as a transpose concerning the system {4 with the basis {u,..., ua}, 
as defined in Example 5.7 in Sect. 5.2. As was mentioned in Sect. 5.2, any separable 
state p satisfies T4(p) = (74 @ 1g)(p) > O. These states are called positive partial 
transpose (PPT) states positive partial transpose (PPT) state. Note that the PPT 
condition T“(p) > 0 does not depend on the choice of the basis of H4®**°. A TP- 
CP map « froma system H 4 ® 71, to another system 14 ® Hp is called a positive 
partial transpose (PPT) map (operation) if the map 7“ 0x07“ isa TP-CP map. As 
is easily checked, any PPT map « can transform a PPT state into another PPT state. 
This condition is equivalent to the condition that the matrix K («) defined in (5.4) has 
a PPT state form similar to a state on the composite system (H4@H 4’) @ (Hp @Hzp’). 
Hence, any PPT state can be produced by a PPT operation without any entangled state. 
Note that S-TP-CP maps also have a similar characterization. Since any separable 
state on the composite system (H,4 @ H4’) ®@ (Hg ® Hz.) is a PPT state on the 
composite system (H, ® Ha’) ® (Hg ® Hz’), all S-TP-CP maps are PPT maps 
[45]. Hence, the class of PPT maps C = PPT is the largest class of local operations 
among C = %, >, <,<, S, PPT. Further, the definition of PPT maps does not 
depend on the choice of the basis™* **. In addition, Cirac et al. [46] showed that any 
PPT operation could be realized by a bound entangled state and an LOCC operation. 

As an entanglement measure related to PPT maps, we often focus on the log 
negativity log ||7“()||;, which does not depend on the choice of the basis®**. For 
a pure stateu = >), /\iuA @ uP, 


TA (\u)ul) = >. Vai Ajlut @ uP) (uA @ uF. 
ing 


Then, 


[74 (lu) (ul)| = VTA (lu) (ul) (la) (ul) 
@2 
= SvAVFint ufo ofl = (TF VRmatu) . (8.217) 
i,j i 


Therefore, ||74(lu)(ul)lI1 = (4; VA) + be. 
—2 log ||7*(\u) (ul) Ili = A (Trg |u) (u)). (8.218) 


In particular, 


A 1 A 1 
TP hPs) = FF, Ire UeehPi)l = 7; (8.219) 
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where F is the flip operator P, — P,. Moreover, the log negativity satisfies the 
additivity log ||T4(p ® a) ||) = log ||74(p) ||; + log ||74(0) ||, and the monotonicity 
regarding the PPT operations « 


IT*K(P)It < M7“), (8.220) 


log [74 («(p)) ll < log |I74(p)||1- (8.221) 


Using (8.219), we can generalize relation (8.7) as 


(©, |p|®,) = Tr pl ®z)(®z| = Tr r*(p)74 ((®z) (Pz) 


74 (pI 
<iir*(p)lhllr*2)(@2D | =. (8.222) 
This relation implies that 
EPPM (p) < D(plla) + log |I74(o)|I1. (8.223) 


The RHS is called an SDP (semidefinite programming) bound [45] and satisfies the 
monotonicity, i.e., 


D(plla) + log ||“ (|, = D(K(p)|IK(@)) + log [74 (K(o) Ih 
for a PPT operation x. It implies inequality (8.220). As a consequence, we have 


Ep (p) < log |i (lh, (8.224) 
Ey) '(p) < D(plia), fora PPT state o. (8.225) 


Hence, the entanglement of relative entropy with PPT states E, ppr(p) = min,-ppr 
D(p||c) and the semi-definite programming (SDP) bound Espp(/) sad min, D(p|| 
a) + log ||74(c)||; do not increase for a PPT operation, i.e., SDP bound satisfies the 
monotonicity. Further, from (8.223) we obtain 


. E @n E, @n 
Ey (p) < lim SN Sia ee (8.226) 
id n— oo n n 


noo 


This relation implies that 


lim 


no 


Espp(p®") = (8.227) 
n 


; E,.ppr(p®") 
E,.s(p) = E,,ppr(p) = Espp(p) = im —— = 


(oe) 


when —H,(A|B) = E,.5(p) because E,, 5(p) is not smaller than E, ppr(p), Espp(p), 


Espp(p®") E,.ppr (p®") 
ro) ri SS 


, and limy_s oo ; 


lim,_, 
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Regarding the direct part, since the quantity E ue (p) satisfies Condition E3’ (weak 
lower continuity) because of Fannes’ inequality (5.92), from Exercise 8.47 and the 
Hashing inequality (8.121), we can show 


EPPT( ,2” 
EFPPT(p) = lim Em (OM) (8.228) 
. n> n 
For any state 7, we choose the positive semidefinite matrix 
f : (8.229) 
o = ———-. ; 
IT4(o Ih 

Then, 

D(pllo) + log |Ir4(o) | = Diplo’). (8.230) 


Since we have one-to-one correspondence between the state o and the positive semi- 
definite matrix o’ satisfying the condition ||74(c’)||,; = 1, we have another formula 
for SDP bound as 


Diplo’) = min D(pla’). (8.231) 


Espp(p) = min 
o'>0:|74 0’) 1 =1 o'>0:|74 (0) || <1 


Using this notation, we can show the convexity.° That is, for two states p; and p2 
and a real number A € (0, 1), we have 


Espp(Api + (1 — A)p2) < AEspp(p1) + Ci — A) Espp(p2). (8.232) 


To show this inequality, we consider the state p,|0, 0)(0, 0] + (1 — A) po 1, 1) (1, 1]. 
Choose oj := argmin, so.) -4(9’,<1 D (pllo’). Applying the monotonicity (a) of Exer- 
cise 5.25 to the partial trace, we have 


Espp(pi + (1 — A)p2) $ DOpi + (1 — A)pailAo + A — A)o9) 
<D(Ap1/0, 0) (0, 0| + (1 — A)pal1, 1)(1, 1IAo4 10, 0) (0, 0] + CL — A)od|1, 1) (1, 1) 
=\D(pi|lo,) + Ud — A)D(p2\105) = AEspp(p1) + (1 — A)Espp(y2). 


Proof of (8.223) Consider a PPT operation «}, on H4 © He and a real number 
r > D(pllo) + log ||74(o)||1. Inequalities (8.222) and (8.220) imply that (®,n- | 
Ky (F2")| Ber) < em" || T4 (6, (02")) la < EW" [174 @) |}. From I — |®err) (Benr| > 0 
we have 


T= (6, )*(@ear) (Bear) = CRI") = (65, Der) (Beer |) 
=(k))*(I — |®er) (Bemr|) > 0, 


®Tf we employ the original definition, the inequality (8.232) cannot shown by the concavity of log. 
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where («/,)* is the dual map of «/, (see @ of Theorem 5.1). Moreover, 


(K),)* (ew) (Dew |) = 0, 
Tr 2"(K/,)* (| Ber) (Derr |) = (Dear |i, (G2) |Beur) < eM El AM) 


Since the matrix (4/,)*(|®enr) (Dor |) satisfies the condition of test 0 < (Kj,)*(|Pen) 
(en |) < I, inequality (3.138) in Sect. 3.8 yields 


—(s)—s(r—log [74 (0) II) 


(Denr | 1, (92) [Domw) = Tr p®" (Ki )*(|Benr) (Del) se" so ——— (8,233) 
for s < 0, where os) & ! d(s\pllo). Using arguments similar to those used 
for the Proof of Lemma 3.7, the condition r — log ||T4(c)||; > D(pl|c) implies 
(Dour |x), (p2")| Pour) — O. We thus obtain (8.223). a 


Further, using the log negativity, Rains [45] showed that 
Eqn (pr) + Egy (02) S Eq9 (p1 ® pr) S Eg’ (1) + log |Ir*(p2)Ihn- (8.234) 


Indeed, Donald and Horodecki [33] proved Condition E3 for E,,ppr(p). Therefore, 
since E,.ppr(~) satisfies Conditions E1, E2PPT, and E4 in a manner similar to 
E,.s(p), Theorem 8.10 guarantees the inequality 


Qn 

tim Earere™) — gPrT(p) (8.235) 
noo n 

In inequality (8.224), the log negativity gives the upper bound of the entanglement 
of distillation; however, it does not give the lower bound of the entanglement of 
cost because log ||74(|u)(u|) |) = 2log(S), Wri) > — 3, Ai log A; = ES (\u) (ul). 
Thus, it does not satisfy Condition E3 (continuity) because Theorem 8.10 leads to 
a contradiction if it holds (Exercise 8.69). In this case, we can show the following 


lemma as its alternative. 


Lemma 8.15 When the quantity Ec(p) satisfies Conditions E1 and E2C, the entan- 
glement of exact distillation and the entanglement of exact cost are evaluated as 


ES,(p) < E°(p) < ES,(p). 


Further, if it satisfies Condition EA also, their asymptotic version are evaluated as 


EC @n 
*(p) < lim de® EC (p), 


a 


Hence, from (8.221) we have the following formula for the exact cost with PPT 
operations [47]: 
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log ||74(p)lli < Eee (p). (8.236) 
Further, Audenaert et al. [47] showed the opposite inequality 
EZ (p) S log(lir* (p) ll + dads max(0, —Amin(T4 (I7"(p))))), 
where Amin(X) denotes the minimum eigenvalue of X. Hence, when 
TA(Ir4(p)|) = 0, (8.237) 
we obtain 


log [I74(p)l = Eee (p)- 
For example, from (8.217) any pure state satisfies condition (8.237). Further, Ishizaka 
[48] proved that all states on the system C* ® C? satisfy this condition. Therefore, 
the entanglement measures for a pure state p = |u)(u| are summarized as follows. 
Let \ be a probability distribution of the eigenvalues of the reduced density Trg p. 
Then, each entanglement measure is described by the Rényi entropy Hj_,(A) = 
2u(s|A) — 1 log 3 Irs asExe- 8.32,8.68 


I I I I (8.238) 
Amin (A) = H(A) = Ai(A) Amax(A), 


Ei DSi Sf) Oa FPO) ser 


IA 


where i= 1,2,C) = —,<,S,PPT,C. = 0, >, < ,S, PPT, C3 =--+,>,<, 
S,PPT, and Cy =—,<, 5. Remember that the quantity Hj_,(A) is monotone 
increasing for s (Sect. 2.1.4). 

To conclude this section, we briefly discuss the relationship between Et (po 
and Theorem 8.3 [49]. In Theorem 8.3, we derived p“-? < p4 from the fact that p48 
is separable. In fact, there exist several other conditions regarding less entanglement: 


A,By 


LE1 (Separability) p4-? is separable. 

LE2 (PPT) 74 @ 1g (p“8) > 0. 

LE3 (Nondistillability) £7, (p48) =0. 

LE4 (Reduction) p4 @ Ip > p48 and I, @ p® > p4?. 
LES (Majorization) p“-? ~ p4 and p*? ~ p?®. 


The relations between these conditions can be summarized as follows: 


Horodecki [50] Horodecki [51] Hiroshima [52] 
LE1 > LE2 => LE3 => LE4 => LES5 


In particular, a nondistillable state is called a bound entangled state when it is not 
separable. The relation LE2=LE1 (Theorem 5.5) has been shown only for C? @ C? 
and C? @ C?. Hence, there is no bound entangled state on the C? @ C? system. 
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However, a counterexample, i.e., a bound entangled state, exists for LE1<=LE2 on 
C? @C* and C? @ C? [53]. Since any PPT state can be produced by a PPT operation 
without any entangled state, this counterexample provides an interesting insight. 
That is, there exists a separable state p’ and a PPT operation x’ such that «’(p’) is not 
separable. Further, it is known that the relation LEZ LE4 holds for C? @ C"™*8” 
[51, 54, 55]. 

As easily checked, Condition LE1 is equivalent to the conditions E? ,(p) = 0 
and E;(p) = 0. Since Ef(x’(p’)) is not 0, Ey is not monotone for PPT opera- 
tions. Further, examining the quantity C//~(p), Yang et al. [56] showed that if the 
entanglement of cost E.(p) is zero, p is separable. That is, a nonseparable state has 
nonzero entanglement of cost. Hence, E, is not monotone for PPT operations. 

Further, for any nonseparable state o, there exist a state p and an integer L such 
that [57] 


ES (p) < Eg ,(p @0), 


which implies that E.(a) > 0. 

In addition, a counterexample also exists for LE4<¢LE5 when C” @ C? [58]. 
However, it is an open problem whether the opposite relation LE2<=LE3 holds. To 
discuss it in greater detail, we focus on the following relation: 


ae __ Espp(p®") __ E,ppr(p®") 
Eq3(p) S$ Ega(p) S Eqn (p) S lim ——~—— < lim ———— 


n noo n 
<Ep*"(p) S Eee (9) S Eee (). 
Since any PPT state can be produced by a PPT operation without any entangled 
state, Condition LE2 is equivalent to the condition E{""(p) = 0. From (8.236) it 
is also equivalent to the condition Ba) = 0. Therefore, if Condition LE2 is 
equivalent to Condition LE3, these conditions hold if and only if one of the above 
values is equal to zero. 


Exercises 


8.62 Let 74 be a partial transpose concerning another (the second) basis. Show that 
there exists a unitary U such that T4(p) = U(r4(U* pU))U*. 


8.63 Let « be a map from the set of Hermitian matrices on H to that on H’. Show 
that 7’ 0 & o T is a CP map if and only if « is a CP map, where 7 and 7’ are the 
transposes on H and H’, respectively. Show that ||r(X)||; = || X||1 for a Hermitian 
matrix X on H. 


8.64 Show that 7“ 0 & 0 r4 is TP-CP if and only if 7? 0 0 7? is TP-CP. 


8.65 Show that 7“ 0 «074 is TP-CP if and only if 74 0 ko 74 is TP-CP when 74 
and 74’ are the partial transposes for other bases. 
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8.66 Show that ||74(p)|]; = ||74(p) ||, when 74 is the partial transposes for other 
bases. 


8.67 Show that the maximally correlated state pq satisfies 


t(r —D 7 
EP (1p) S daa (r 141 (Palla) 


8.239 
1>0 1+t ( ) 


where E/')'*(r|pa) is defined in (8.91) and gq is given in (8.142). (8.239) is a 
generalization of (8.160). 


8.68 Prove the following equation:for an entangled pure state |u): 
Eine (lx)(x|) = — log Xj. (8.240) 


(a) Prove the following inequality as a generalization of (8.222): 


Trop < |IT4@)IIll7* (1. (8.241) 
(b) Prove 
1 
max Tr |®z)(®g|K(o) < — (8.242) 
o€Sppr d 


for & € PPT, where Sppr is the set of positive partial transpose states. 
(c) Show the following by using (8.217): 


max Tr|x)(x|o = AY. (8.243) 


o€Sppr 
(d) Prove (8.240) by combining (8.242) and (8.243) in a way similar to (8.96). 


8.69 Check the following counterexample of the continuity of 2 log ||74 (|x) (x|) [lq 
as follows [59]. 
(a) Show that ||p®”" — py ||; > 0, i.e., F(p®", pn) > 1, where 


def {e-nH (+9 <p" <e-M(H(P)—9)} yn 
Pn = Tre") +6) <p" <e—n(H(p)—9) } p@n * 


(b) Show that H(p) —e€ < +H, (Pn) < H(p)+2¢e for a > 0 and sufficiently large n. 
(c) Check that the purifications x,, y, of p®”, pn, give a counterexample of the con- 
tinuity of 2 log ||74(\x)(x]) 1. 


A B* 
BC 


A+C 0 AB : : OT : 
( 0 At . ze & =) by using the unitary ‘& 0 ) . This argument means 
that LE4 Reduction criterion implies LE1 Separability on the system C? @ C” [55]. 


8.70 Let A, B, and C be n x n matrices. Show that ) > 0 when 
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8.71 Define the SDP bounds with relative Rényi entropy. 
E1ysisp(p) = min Diss(pllo) + log |I74() Ih, (8.244) 
Ex4s\spe(o) = min Dy, (olla) + log [Ir4(o) Ih (8.245) 
Show the following relations similar to (8.231) 


Ej+s\spp(p) = min Pits spp (plla’) 
Oo 


>0:|74 (0) |h= 
= min D14s\spp(plla’), (8.246) 
o>0:||74 (o)||1<1 
E = D : 
1+s| SDP (P) poy spp (Plo) 
= min Diss spp(P a’). (8.247) 


o/>=0:||74 (oe) || 1<1 
8.72 Show the following relations similar to (8.231) 
eo E1+sispp (Api +(—A)p2) 


> es Fissispe (oy) +(- Ayer E+sispr (2) for s € [—1, 0], (8.248) 
eS E1+sispp Api + (1A) po) 


<)erEltsispr (rr) +(1- Ayes Bitsispe(y2) for s € [0, 1], (8.249) 


est its spp (Api t+ (1—A)p2) 


> es Ei+sispr (1) +(- des Et+sispP(p2) fors € I-> 0], (8.250) 


eo Ess spp (Api+(1—A)p2) 


<)eSEt+sispe(p1) +(1- NJ es EvssispP(p2) for s € [0, 00). (8.251) 


8.73 We define Ey')"*(r|p) similar to E73 (r|p) defined in (8.239), Show the fol- 
lowing relations 


t(r — E+: spp (p)) 
ES* Ss : 8.252 
a2" |p) = sg aa ( ) 


t(r — Egy spp(p)) 
1+t ; 


E73 (r|p) = max (8.253) 
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8.13 Violation of Superadditivity of Entanglement 
Formation 


8.13.1 Counter Example for Superadditivity of Entanglement 
Formation 


In this section, we give a counter example for superadditivity of entanglement for- 
mation by Fukuda [60] while the first counter example was given in Hastings [61]. 
In order to give a counter example for superadditivity of entanglement formation 
(8.145), we consider a large bipartite system C‘ @ C” = C”, in which, the system 
Ha,1 is given as C* and the other system Hg. is given as C”. Then, we focus on 
a [cn]-dimensional subspace K and its complex conjugate subspace K defined as 


K := {x €C™ |x € K}. (8.254) 


In the following, we consider that the space K as the subspace of 742 @ 72,2, in 
which, the system H4 2 is given as C* and the other system Hg. is given as C”. 
Then, we obtain the following lemma for the bipartite system 4 ® 7H, where 
Ha = Hai @ Ha. and Hg := Hg 1 @ Hp. 


Lemma 8.16 Any [cn]-dimensional subspace K satisfies 


min E(|x)(x|) <2 (1 7 -) logk +h (-) (8.255) 
Ix)\eK@K ~ k k 


Proof First, let V be the isometry from Hc, := C!! to H4,1 ® Ha,1, whose image 
is the subspace K-. Then, the complex conjugate V is the isometry from Hc.2 := C ro 
to H4,2 ® 78.2, whose image is the subspace K. Then, we have 


Tr VV* = [en]. (8.256) 


In this proof, we denote the maximally entangle states a >, |Uj, Ui), Fa Dy lis 


u;), and Ta Se! |u;, ui) on Har @ Haz, Her @ Heo, and He: ® He» by 
|®,4), |®g), and |®c), respectively, where u; is the canonical basis. 

Now, we focus on the state (V ® V)|®c)(®c|(V* @ V7) in Hy ® Hg. Due to 
(8.256), the maximal eigenvalue of Trg(V @ V)|®c)(®c|(V* ®@ V") is bounded by 


the following quantity. 


(Da\(Tre(V ® V)| Gc) (@c|(V* @ V"))|®a) 
>(H4, Be|(V @ V)|Gc)(Pc|(V* @ V")|Pa, Pa) 


1 a2 [cn] 5 _ [en] 
J/nk{cn| qa nk = nk 


Recall Exercise 2.3. Then, the above constraint for maximal eigenvalue yields that 


=|(P4, Pal(V @ V)|Gc)* = | 
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H((Trg(V ® V)|®c)(®c|(V* @ V"))) 


<h (=) + (: - ot) log(k* — 1) 
<h () re (1 e 4) log(k2 — 1) < 2(1 _ 7) logk +h () ; 
= 


On the other hand, we have the following theorem with respect to the Rényi entropy 
of order 2, H2(Trg |x)(x|) < H(Trg |x) (x). 


Theorem 8.14 For given constants c € (0, 1), € > 0, €' > 0, and a positive integer 
k, there exist a sufficiently large n and a {cn]-dimensional subspace K c C"™ = 
Ha ® He such that 


min  Ap(Trz |x) (x|) 
xeKnsrlen-! 


2 
(—4clog ¢) +2,/—2clog $1 —2clog$\\ 4 
1-é k 


> logk — > — log 1+ 


t / / 2 
«  [ CAclog $) +2,/—2clog $,/1-2clog$\" 
> logk ~ + — c (8.257) 
=¢€ 


Any state p on K satisfies that 


Ey(p) = min E(\x/(xp= min A (Trg |x)(x|) 
xeKnszlen-! xeKnszler-! 
> min A,(Trg|x)(x|). 
xEKNS2ien-] 


When the subspace XK is chosen by the above theorem, any state p on K satisfies that 


2 
(—4e log $) + 2,/—2clog $,/1—2elog¢\ 4 


Er > logk ; 
f(p) = log ear k 


The complex conjugate subspace K also has this property. 

Now, we choose a [cn]-dimensional subspace K C C”* = H,4 @ Hg given in 
Theorem 8.14. Then, we choose a pure state p in K @ K that realizes the minimum 
entropy evaluated in Lemma 8.16. Therefore, fixing a real number c and taking k to 
be large, we obtain 


Cc Cc Cc 1 
E;(p) = E(p) < 2(1 _ =} logk +h (7) = 2logk — preete z Bk 


1 
<2logk +o (F102) 
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! i é ‘ 
2, (—4c log $) + 2,/ 2c log ¢,/1 2clog<\ | 


=2logk 
ia k l-€é€ k 
SE (Tro p) + Ey (Tr p), (8.258) 


which contradicts the superadditivity of entanglement formation (8.145). 


8.13.2 Proof of Theorem 8.14 


Firstly, we notice that 
era OGD — Tr(Trp |x) (x1)? = |] Tre |x) xl Il3 


1 
=_ tll Tea lex = pmix.all2 (8.259) 
for any unit vector x € H,4 ® 7g. Hence, we obtain 
Ay (Trg |x)(x|) = logk — log( + k]| Trg |x) (x| — Pie alla): (8.260) 


Next, we prepare the following two lemmas. 


Lemma 8.17 For a given |x), |y) € Ha ® Hp, we have 


[I] Tea bx) (x1 = Pmix,all2 ~ I Tee Ly) (yl = Pmix,allal — 
d(x, y) ~ 

||| Tre |x)(x| — Pmix,all2 — || Tre ly) (yl — Pmix, allel 
d(x, y) 


as (8.261) 


1 
<2,/max( Trp |x)(x| — pmix,all2, ll Tra ly) (yl — Pmix,all2) + ae (8.262) 


Proof Due to (8.259), (8.261) is a quantum extension of (2.209) in Exercise 2.51 with 
a modification. However, we show it in a different way. We chose k x n matrices 
X and Y as |x) = |X) and |y) = |Y). Since |X|, ||Y|| < 1 and ||X* — Y* ll. = 
|X — Yl2 = |||x) — |y)|], using Exercise 2.54 and (A.22) with i = 2, we obtain 


||| Tre |x) (x| — Pmix.all2 — || Tre |y)(y| — Pmix,a lll 
S|||XX* — pnix alle — ||YY* — pmix,alla| 
<||XX* — YY" lo = ||X(X* — ¥") + (X—- YY" Ilo 
|| X(X* — Y*) Ilo + CX — Y)¥* lo = X(X* — ¥*)|l2 + V(X — PY Il2 
S| XIX" — Y" lo + YAWN — Vil2 < 2max(|| XI), IY Dillx) — ly) Il 


(a) 
<2max(||X||, |[¥|)d(, y), (8.263) 


8.13 Violation of Superadditivity of Entanglement Formation 429 


where (a) follows from Exercise 2.54. Due to the relation max(||X||, ||Y ||) < 1, we 
have (8.261). 
Since || X ||? = ||X|| - ||X*|| < ||XX*|| and 


1 
|| XX*|| < ||XX* — pmix.all + llomix,all < || XX* — pmix.all2 + ; 


1 
=|| Tre |x)(x| — pmix,all2 + —, 


k 
we have 
1 
XI] < yf ll Tee lx) (x] — Pmix,alle + E (8.264) 
Combining (8.263) and (8.264), we obtain (8.262) | 


Now we prepare the following lemma. 


Lemma 8.18 We assume that the pure state |x) € Ha ® He, is generated subject to 
the invariant measure j17, given in Sect. 2.6. Then, we have 


kK+n —1+k/n 
nhk+1 k+1/n’ 


Al B | x | Pmix All2 = / ’ ( ¥ ) 
E Ir x)(x} — > ee ee 8 266 


where Ey, is the simplification of the expectation E 


Ex, Tr(Trg |x) (x|)? = (8.265) 


LH" 


Using Theorem 2.11, and Lemmas 8.17 and 8.18 we can show the following 
lemma. 


Lemma 8.19 Fora given 6 > 0, we choose C; := 264 2,fo2 bi+y/ aie ba) peas 


When the pure state |x) € Hs ® Hep generated subject to the invariant distribution, 
the relation 


|| Tre |x) (x| I a Lae ee 3 (8.267) 
TB IX) (x Pmix,A||2 > n+1/k nk —1 6 : 


holds at most with the probability eo Fnk-1) 79, 


Proof Due to Lemma 8.18, it is sufficient to show that the relation 


T 
| Tra oe) Oe] — Pmnixa lla > Enell Tee bx) (1 — Pmix.alla +f ——> + Cod 
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holds at most with the probability e~* *—) /2, The relation (8.261) guarantees that 
the function f : |x) +> || Trg |x)(x| — pmix,all2 has the Lipschitz constant Co = 2 
(See (2.233).). If the Lipschitz constant of the function f is bounded by Cs; on the 


subset (|x) € $2"! NH, @Halll Tra |x) el—pmix,all2 < Vf E+ / Ga +56}, 
Theorem 2.11 yields (8.267). Hence, it is sufficient to show the above relation for the 


Lipschitz constant of the function f. For this purpose, due to (8.262), it is enough to 
show that 


1 
2 max Trp |x) (x| — Pmix,all, || Tra ly) (yl — Pmix,alle) + k < Cs = (8.268) 


for elements x, y of the subset, which is equivalent with 


1 — 1/k 
n+I1/k 


1 1 
4/ 7 + C56 + k < C5. (8.269) 


Solving the quadratic equation for Cs, we can check that the above inequality holds 
under the our choice of C;. | 


Lemma 8.20 Given an |-dimensional subspace K of Ha ® Hg, any €-net 2 of 
KO S?'-! satisfies that 


I| Trp |x)(x| — pmix,all2 
|| Tre |x)(x| — pmix,all2 < max r=. (8.270) 
cerns ! xEQNS2I-1 1 —2sine 


Proof For any x € KM S*—!, we choose y € 2 S7—! such that d(x, y) < €. 
Thus, Exercise 2.52 implies that |||x)(x| — |y)(ylllh < 2sine. Since the rank of 
|x)(x| — |y)(y| is two, there exist two unit vectors w;, w2 € K and a positive real 
number c < sine such that 


|x)(x] — |y)(y| = clwi) (wil — clwe) (we. (8.271) 
Thus, using (A.21), we have 


I| Tre |x) (x| — Pmix,all2 
=| Tre ly) (y| — Pmix,a + (Tre |wi) (wil — Pmix,a) — c(Trp |w2)(w2| — pmix,a)ll2 
<|| Tra ly) (yl — Pmixall2 + ell Tre |wi) (wil — pmix.all2 

+ cl] Trg |w2)(w2| — pmix,all2 


SI Tra ly){yl — Pmix.alle + 2¢ | ae || Trg |w) (w| — pmix,all2 


< max |Traly){y| — pis. all 
ye ans! 


+2sine max || Trg lw) (wl — pmix.alle- 
wekns2-! 
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Taking the maximum with respect to x € KN S~~!, we obtain (8.270). a 


We choose an e-net 2 of KM S*-! by using Lemma 2.11. That is, |Q| < 7/21 — 1 


(Aye Now, we choose the unitary matrix U € U(C"*) subject to the invariant 


distribution. Then, for any element y € 92, the unit vector Uy obeys the uniform 
distribution p: on S”*—!, Now, we apply Theorem 2.11 with f(x) := || Trg |x)(x| — 
Pmix.All2. We choose 5 > 0 such that e~® *-) /2 = 7/21 —1(-+)7"! < JQ\-, 


sin € 


Le., & = —2E1 Jog Ste — 1 Jog(2m/2T— 1). As 2 > 1, we have 


21-1 sin € 1 
log 
2 nk — 


<6 := 


ay log(27V2I — 1). (8.272) 
nk —1 1 


Due to Lemma 8.19, the probability of the following event is greater than 1 — 


|92 le ~)/2, which is strictly greater than zero: The relation 


1— 1/k? 
n+1/k 


TT 
| Trg |[Uy)(Uy| cand Pmix,A|l2 < = + C5, 5n- Vy € 2. 


(8.273) 
Thus, we can choose a unitary U satisfying (8.273) for any y € §2, and define the 


subspace K’ := UK. Therefore, due to Lemma 8.20, any unit vector y of KN $7“! 
satisfies 


1-1/k 
n+I/k mat + C6, 9n 
max || Tre |y) (yl — Pmix,all2 < : (8.274) 
yeK’/ns2-1 1 —2sine 
When we choose «¢ to be 2sine = e€’ and / to be [cn], we have limy-. 6, = 


[—* log ©, limy+oo Cs, = 2,/—*S log ¢ + 2,/1 — * log ¢ because of (8.272). 


kk 
Thus, the RHS of (8.274) goes to 


2 
2¢ é 2c é 1 2c é 
2( — | los 7 +2,/ “log $/} j 108] 


( “log $) +2 2 log ¢./2 * log ¢ 
= . (8.275) 


Therefore, since 


(—4£ log 4) +2) Jog ¢,/1 “log $ 


/ 


l-e 


2 
(—4e log $) + 2,/—2elog $,/1—2clog¢\ 4 
l-€ k’ 
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combining (8.260), we obtain (8.257) because : is an arbitrary constant independent 
of n. More precisely, as a lower bound dependent of n, we obtain 


min  A>(Treg |x)(x|) 


xeKnsieri-! 
/1—1/k? T . 
n+1/k Bs nk—1 =F Ci, On 1 
>logk —log | 1+ ; (8.276) 
1 —2sineé k 


Proof of Lemma 8.18 In order to show Lemma 8.18, we assume that the pure state 
|x) € Ha @71z is generated subject to the invariant measure py given in Sect. 2.6. We 
denote the flip operator on 7141 ® 714.2 by Fa, and the projection to the symmetric 
(anti-symmetric) space on 74,1 ® Ha.2 by Psa (Pa.a). Similarly, we define Fz, 
Psp, Pa.p, Fas, Ps.ap, and Paap. Then, we have 


Ps AB = PsA ® Ps.B ae Puja ® Pap. (8.277) 
By using (8.174), the value (8.259) is calculated as 


Tr(Trg |x) (x|)° = Te Trg. 1 |x) (x] ® Trg.2 |x) (x| Fa 
= Tr |x) (x| @ |x) (x|F'4 @ In. (8.278) 


Using (8.277) and (8.278), we can calculate the expectation of Tr(Trg |x) (x |)? as 
Ey Tr(Trg |x) (x|)? = Ene Tr |x) (x| @ [x)(x| Fa @ Le 


= Tr ——____-P, 4p FF, @ Ip 
neki eee 


_ 1 


= Ti ——_—_—__—_ P, P, — Py P; Py 
T kk + D2 s,AB(Ps A A) ® (Psp + Pa.) 


=Ey, Tr qakappe ® Psp — Paa ® Pap 
1 k(kK+1) nawt+l1) k(kK-1) ntv—1) 
“nk(inkt+l/2> 2. 2 > 9? 
1 kn(k +n) k+n 1+k/n 
~nk(nk+1)/2.. 2.  nk+l k+1/n’ 


which implies (8.265). Using (8.259), we have 


14 1/2 


ala (8.279) 


Ex4l| Trp x) (x| — Pmix,alla = 


Using Jensen inequality with respect to x +> x”, we obtain (8.266). a 
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Exercises 


8.74 Show the following inequality (8.280) instead of Lemma 8.17 by following 
the steps below. 


[|| Tre x) (x1 = pmix,all3 — || Tre» Ly) (yl = pmix,all5| < 22, 


2 (8.280) 
d(x, y) 


where d(x, y) is defined as cos d(x, y) = |(x|y)|. Compare that cosd(x, y) = 
Re(x|y). 
(a) Show Heel ell <2. 

x,y _ 
(b) Show the inequality (8.280) by using (8.278) and (a). (Use a similar discussion 
to Exercise 2.51.) 


8.14 Secure Random Number Generation 


8.14.1 Security Criteria and Their Evaluation 


When a given classical random number A is partially leaked to the third party, the 
random number is not secure. In this case, it is possible to increase the secrecy by 
applying a hash function. Now, we assume that the third party, Eve, has the quantum 
system 7/, correlated to the classical random number A, which is described by the 
d-dimensional system 7{4 spanned by the CONS {u nie Then, the state of the 
composite system 4 ® 7¢ is written as 


d 
p= > Pa(s)luj) (uj ® peyj- (8.281) 
j=l 
The leaked information can be evaluated by the mutual information. 


I,(A: E) = D(pllpa ® pe). (8.282) 


When we employ the trace norm or the fidelity instead of the relative entropy, the 
criterion is given as 


d\(A: Elp) = |lp— pa ® pelhi (8.283) 
F(A: Elp) = F(p, pa ® pe). (8.284) 


When we take the uniformity of A into account as well as the independence, we 
employ the quantities 
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1'(A: E) = D(pllpmix,a ® pe) = logd — H,(AIE) (8.285) 
= I(A: Elp) + D(palpmixa), (8.286) 

d\(A : Elp) := |lp — pmix,a ® Pelli, (8.287) 
F(A: Elp) = F(p, pmix,s ® pr). (8.288) 


The quantity 7 pA : E) is called the modified mutual information, and satisfies the 
uniqueness under a suitable collection of axioms [62]. 

Now, we focus on an ensemble of the hash functions fy from {1,...,d} to 
{1,..., M}, where X is a random variable describing the stochastic behavior of the 
hash function and subject to the distribution Py because a randomized choice of the 
hash function makes the evaluation of the above values easy. In this case, the random 
variable X is independent of the state p in the composite system 714 ® He, and we 
denote the system describing the random variable X by Hx. The state of the total 
system 714 ®@ He © Fx is written as 


>) Px(x)x) (x1 ® pace, (8.289) 


which is denoted by p ® Py, Then, the total system is composed of the quan- 
tum system 7#(g and the classical systems fy(A) and X, the state is given as 
Dix Px (x) |x) (x] ® pryca),e, Where prraye = D1, |b)(b| @ (Na pan PA@PE|a)- 
The security is evaluated by J _ Py (fx(A) : E, X), which can be expressed as 


Th Py (Fx (A) LE,X)= ExI/(fx(A) TE): (8.290) 


That is, when we employ the random choice of the hash function, it is sufficient to 
focus on the expectation Ey/'(fx(A) : E|p). 

Anensemble of the functions fy is called universal, when it satisfies the following 
condition [63]: 


Condition 8.1 For arbitrary two distinct elements a, # ay € {1,..., d}, the prob- 
ability that fy(a,) = fx (a) is at most ve 


Indeed, when the cardinality d is a power of a prime p and M is another power of 
the same prime p, an ensemble { fx} satisfying the both conditions is given by the the 
concatenation of Toeplitz matrix and the identity (X, 7) [64] only with log p (d — 1) 
random variables taking values in the finite filed F, = Z/pZ. That is, the matrix 
(X, I) has a small calculation complexity. 


Theorem 8.15 ((65]) When the ensemble of the functions { fx} is universal, it sat- 
isfies 


Ip@py (fx (A) : E, X) < Ingp, (fx (A) : E, X) = Exl,(fx(A) : E) 


SMS ‘ s(log M— Fh 4s\(A|E)) 
cE eo As(AIE) — ys © 


Ss AY 


(8.291) 


where v is the number of eigenvalues of pe. 
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That is, there exists a function f :.A > {1,..., M} such that 


s (log M—Fh 4))(A|E)) 


(f(A): BE) < v= (8.292) 
Ss 


Next, we consider the case when our state is given by the n-fold independent and 
identical state p, i.e., p®”. We define the optimal generation rate 


Tyan (fn (A) : E) _ 


G(p):= sup lim Hoes Mi 
(tion [222 | tim Bernt) _ 
noo log M,, 


Tion(Jn(A) : EY _ of 
n 


n—>Oo 


log M,, 
—— _ | lim 
n—- oo 


’ 


{(fn.Mn)} [™7 0 


= sup 2 


whose classical version is treated by [66]. The second equation holds as follows. the 


ate. ae Hen (fn(A)) (on™ Imi. aca) 
Fi mix, fn (A) a 
condition limy—oo aes” ale n = 0. 


Lan (fn(A):E) 
n 


= | is equivalent with lim,_.. 


Aan Chr (A)) 


= 0 and limy-+. oe 


Hence, limp 
Von fn(A):E) 


= | if and only if 


limy-+ oo 0. 
When the generation rate R = limy_+ oo 10g Mn ig smaller than H (A|E), there exists 
a sequence of functions f, : A — {1,..., e”®)\ such that 
8 RA 4 sip (A|E)) 
Tian (fn(A) : E) < vu, ————_, (8.293) 


AY 


where v,, is the number of eigenvalues of pe. which is a polynomial increasing for 
n because of (3.9). Since lim,_,o Fly 4s)p(AlE)) = H,(A|E)), there exists a number 
s € (0, 1] such that s(R — Fy 4sip(AlE)) > 0. Thus, the right hand side of (8.293) 
goes to zero exponentially. Conversely, due to (8.12), any sequence of functions 
fn A” {1,..., e"*} satisfies that 


im ee nA) “ Hye (A|E) 
n 


noo n 


= H,(A|E). (8.294) 


: Han (fu(A 
When limy-; 00 A es 1 


nR = oe 
Tyan (fn(A) 2 E ~ Ayen(fn(A)|E 
im tA: 2) py Hom GnfADIE) 
noo n noo n 


> R—H,(AlE). (8.295) 


That is, when R > H,(A|E), ao does not go to zero. Hence, we derive the 
formula by [67, 68]: 
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G(p) = H,(AlE£). (8.296) 


In order to treat the speed of this convergence, we focus on the supremum of the 
exponentially decreasing rate (exponent) of I jan (fn(A) : E) fora given R 


e;(p|R) 


— log Vien (fn(A) : E) — log M,, 
:= sup ; lim x lim = <Ry. 
{fn Mn) yr” ut eee At 


Since the relation s Ay4s))e(A|E) = ns Ay4s,)(A|E) holds, the inequality (8.293) 
implies that 


e1(e|R) > en(p|R) = max sH14s\p(A|E) — sR 


= max 5(Ai4s,p(A|E) — R), (8.297) 


whose commutative version coincides with the bound given in [69]. 
Next, we apply our evaluation to the criterion d|(A : E|p). When { fx} satisfies 
Condition 8.1, combining (3.53), (3.50) and (8.291), we obtain 


s/2ygs/2 “ 
V2 PMO $s ( ALB), 


Vs 


Exdj(fx(A) : Elp) <yExdi(fx(A) : Elp)? < 
(8.298) 


s 


K Ms 
Ex F'(fx(A) : Elp) =1—- oe (8.299) 
S 


Then, similarly we can derive their exponentially decreasing rate (exponent) in the 
n-fold asymptotic setting. 


8.14.2 Proof of Theorem 8.15 


In order to show Theorem 8.15, we prepare the following two lemmas. 


Lemma 8.21 The matrix inequality (I + X)° < I + X°* holds with a non-negative 
matrix X ands € (0, 1]. 


Proof Since I is commutative with X, it is sufficient to show that (1+ x)* < 1+x° 
for x > 0. Sp, we obtain the matrix inequality. a 


Lemma 8.22 The matrix inequality log + X) < 1xs holds with a non-negative 
matrix X ands € (0, 1]. 


Proof Since I is commutative with X, it is sufficient to show that log(1 + x) < x 
for x > 0. Since the inequalities (1 + x)* < 1+ x°* and log(1+ x) < x hold for 
x > Oand0 <s < 1, the inequalities 
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log(1 + x) Z log(1 + x°) _% (8.300) 
s s 


logdi +x) = . 


hold. 1 


Now, we prove Theorem 8.15. 


Ex!'(fx(A) : Elp) 


M 
1 
-0 (Zee > rag! 08) 


i=l a: fx(a)j=i 


1 
=Ex >) Tr Pa (a)p% (log ( > Pata’)p%) — log we) 
& a’: fx(@’)=fx(a) 


1 
<>) Pa(a) Tr pe (log (s ¥ Pa(a')p,) — log wr) (8.301) 
i a’: fx(a)=fx@ 


1 
=>) Pala) Tr pf; (m (» (a)p%, + Ex .? Pa wt) — log we) 


M 
a’: fx (a)=fx(@),a'#a 


1 1 
< > Pa(a) Tr p& (ms (» (a) pe + 7 >= Pa wt) — log i) (8.302) 


a a’:a'#a 


1 1 
<>) Pa(a) Tr pe (108 (Ps (a) pe + wzPt) — log art) 
a 
<> Pal@ Trp (log (vPa(akpg (4) + pe) — log ~ pe (8.303) 
= - E PE\PE M M 


=>" Pala) Tr pf log(vM Pa(a)kipg (OG) Pg + Ds 
a 


where (8.301) follows from the matrix convexity of x > logx, (8.302) follows 
from Condition 8.1 and the matrix monotonicity of x +> log x, and (8.303) follows 
from (3.146) and the matrix monotonicity of x +> log x. 
Using Lemma 8.22, we obtain 


= Pa(a) Tr pf log(vM Pa (a) kp, (0%) Pg! + D 


1 a a —ly\s 
=. > PaCa) Tr pe (UM Pa(a)kp, (P)05') 


uv’ M* 


S 


> Pua) Wales) ey 


v’ Ms _ v’ MS as 
=e ee eg le), (8.304) 
S S 


where (8.304) follows from (5.57). 
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8.15 Duality Between Two Conditional Entropies 


8.15.1 Recovery of Maximally Entangled State 
from Evaluation of Classical Information 


Firstly, for a given state p on the composite system H4 ® Hg, we consider 
a sufficient condition to approximately generate the maximally entangled state 
|®) := par ql) @ |u';) € Ha ® Hay, where {uj} and {w’}} are the CONSs 
of 74 and 74’, respectively. For this purpose, we focus on the following two con- 
ditions for a pure state p = |W)(W| on the composite system H4 @ Hg ® Hr. 


€,-bit security: The PVM E := {lu j) (uj 1354 satisfies F (Kg @ tr(Par)s Pmix.A @ 
pr) = 1- €]. 

€9-bit recoverability: There exists a POVM M = { Mj}"_, on Hg such that 
pe Tr pag|uj) (uj ® M; >l-e. 


Then, we obtain the following theorem. 


Theorem 8.16 (Renes [70]) Assume that a pure state p = |W)(W| on the system 
Ha ® He ®@ He Satisfies the above both conditions. Then, there is a TP-CP map 
Kk: S(Hp) > S(Ha’) such that 


F(t4 ® K(pas), |®)(P) = 1- fat Jen”. (8.305) 


Theorem 8.16 guarantees that we can approximately generate the maximally 
entangled state between two systems 74 and 7H, only by the operation on the 
system 71, if the classical information on the specific basis on the system 7, can be 
recovered by the system 7g, is close to the uniform random number, and is almost 
independent of the environment system 7/,. These conditions can be easily checked 
because only the classical information of the specific basis concerns all of these con- 
ditions. That is, we do not have to care about other informations for constructing the 
maximally entangled state. 


Proof Step 1: Case when «; = 0: First, we choose an isometry Ug : Hg —> 
Ha @ He such that Mj = U*|u;)(u;| ® IU. Then, the state Ug|W) can be 
written as 


d 
Usl¥) = >) Valu ju), xB), XRij)- 


j=l 


Next, we choose the purification |7qg-R,;) of pr such that F(|xa\j)(xr jl, PR) = 
F(lxpyj,Xryj) XBj>Xrijls Warr) (Warr). Now, we denote |~grrii) by |wWarr). 
Thus, there is a unitary Ug); on Hg such that Ug) ;|Werrj) = |Yerr). Then, 


we define the pure state |€) := Looe |uj, u'., Werrij) and the unitary U, on 
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Ha ® He such that U, = @4_,|u)(u, | ® Upyj. Thus, U,|€) = |®) @ lave). 


Since Kz @ tr(Par) = ae qjluj)(uj| ® lea) (xa, 


d 
1-1 < F(Kg @ te (Par): Pmix,A @ Pr) = >. / AF (lee) (Rij Pr) 
j=l 


d 
qj 
=> 4) qlee XR) (XB js XRIjl, errs) arr l) 
j=l 


=F (Up|W)(W|Up, |€) (El) = F(UgUBIY) (Y|URUR, |P)(P| ® Warr) (Yarrl) 
<F (Trprp Uj,U pU%U, |©)()). (8.306) 


Hence, defining the TP-CP map k : S(Hg) > S(H,4 ® Hp’) by 
K(o) := Trg U,Ugo(UZUR)", (8.307) 


we obtain (8.305). 
Step 2: General case: In the general case, the state Ug|W) can be written as 


d 
Us|¥) = D> Valu; XA‘ js XB j>XR\j) 


j=l 


satisfying that (uj |x4"|;) = 0. We define another state by 


d 
W') = UR >, Sq luj, ui, XB1j, XR1j) 


j=l 

and p’ := |W')(W"|, which satisfies the assumption of Step 1. Since 4 qj\ 

(u';|xa"1j) (= ye Tr pagl|u;)(u;| ® Mj, the €2-bit recoverability guarantees that 
F(\W)(Y |, [W)('|) = FU) (WU 5*, Us| WU) = Da (ui |xanj) 


d 
= 2a (wi lxanj)P? = 1—e. (8.308) 


Since Kg @tr(P'yp) = 4 qj\uj)(uj|® |xr\j)(Xrjl = KE @LR (Par), by applying 
Step | to the state p’, the TP-CP map k : S(Hg) > S(H,4 @ Hze:) defined in (8.307) 
satisfies 


1-1 < F(a @ K(p4g), IP) (PI). (8.309) 


Then, (8.308) and (8.309) yield that 
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b(ta @ K(paB), |P)(P]) 
=b(t4 ® (pas), LA @ K(P'yp)) + (La @ K(P'yp), |B) (PI) 
<b(p, p') + b(ta @ K(P'4g), |P)(Pl) < Jar + Jer, 


which implies (8.305) a 


Theorem 8.16 requires €; bit security. However, in a case, the €; bit security holds 
only with the partial trace in a part of the system 7/4. In order to address such a case, 
we generalize Theorem 8.16 as follows. Now, we consider the following conditions 
for a pure state p = |W) (W| on the system Hy, ® Hs, ®@ Hg @ He. 


€-bit security for H4,: ThePVM E! := {|uj, (uj, 4-1 on Ha, satisfies F (k 1 ® 
tr(PA,R)s Pmix,A, ® Pr) = 1—«€,. 
€2-bit recoverability for 714, ®H,,: There exists a POVM M = [M,, ;,} on He 


such that 7, ;, Tr paslu},.U;,)(u),.U5,| @ Mj, jp = 1—e. 


JisJ2 
Theorem 8.17 Assume that a pure state p = |W)(W| on the system Ha, ® Ha, ® 
Hp ® Hr satisfies above both conditions. Let {v;} be a basis mutually unbiased to 
{uj} of Ha,. We can choose a TP-CP map k; : S(Hg) > S(Hz4,) dependently of | 
such that 


dy 
(> La, ® Ki ((ui|Pa,ArBlUz)) @ |uz)(v|, |P) (P| ® Pass) 


l=1 


>1— (fea + fer)? = 1 —2(e2 + €1). (8.310) 


In particular, when the PVM F? := {|v;) (w}2, on Ha, satisfies Kp2(Pa,) = Pmix,A> 


dy 


1 
De PUA @ fide (url pa, azalen)), |®)(#l) 
j=“? 


>1-(Ja+/a) >1-2e+a). (8.311) 


Theorem 8.17 relaxes the conditions of Theorem 8.16. That is, Theorem 8.17 has 
a wider applicability than Theorem 8.16. In fact, Theorem 8.17 plays an important 
role in Sect. 9.6. In Sect. 9.6, Lemma 9.7 will be shown by Theorem 8.17. The Hash- 
ing inequality (8.121) for entanglement distillation will be also shown in Sect. 9.6, 
Lemma 9.7 plays an essential role in this proof. 


Proof Step 1: Case when €2 = 0: Due to the relation (7.52), since €2 = 0, the PVM 
F’ satisfies 


LAR @ Kp? (Pa, Ark) = Pmix,A, ® PA,R> (8.312) 


which implies that (v;|p4,|v:) = 4 Thus, we have 


2 
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dy 


1 

De GP! B te (do(v11p4,4z8l01)), Pmix., ® PR) 
2 

l=1 


=F (Kg @ Kp @ br(PA,ArR)s Pmix,A, ® Pmix,A, ® PR) 
=F (Kg! @ tr(PA,R)s Pmix,A, @ Pr) = 1—e. 


Due to Theorem 8.16 with €; = 0, there exists a TP-CP map Kk; : S(Hg) > S(Ha,) 
dependently of / such that 


(> ta ® scion al) |®)(D|) 


1 
>F (Kp! ® tr(do(vj|pa,arR|V))s Pmix,A, ® Pr)- (8.313) 


Thus, 


dy 
(> La, ® Ki ((vi|Pa,A,B|U1)) ® |vr){Yz|, Pmix,a, ® ove) 


l=1 


1 
= DP (4 8 Gluloaaely), 1)! 
l=1 “ 1 
dy 
=D 7 its ® tr (da(u5,|Pa,ArR|M5,))» Pmix,A, ® Pr) = 1—€1. 
5 2. 
j=l 


Step 2: General case: In the general case, we choose an isometry Ug : Hg > 
Ha, @ Ha, ® He such that Mj, j;, = U*|ui, ui) (ui. ui @ IpU. Then, the state 
Upz|W) can be written as 


= sola! 34,2: x2 2 ~ fds 
Uz|¥) = > VGji.jn\4j,> Wins X Ar jp» X ASL jy? XB iodo? XRM iio) 
Asp 


satisfying that Cad aii (u2,'[xA41 0) > 0. We define another state 


d 
N\ yk 1 2 1¢ ot 
lv) =U, > AG juin Mj Wigs Uy 0 UG,» XB iin *Rli he) 
Ap 


and p! := |W')(W’|. Since 


d d 
i? 2h 2 “sie i pa 28 
De Ginsial Uj, 9 Wy agi Fated” = DL Te Pas dawle yj Gy) (Uj, Hj,1 @ Mis 
Av din 


similar to (8.308), the €2-bit recoverability guarantees that 
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FW) (HI, |W )("|) = 1 — e. (8.314) 


Since Kg) 72 @ lr(,,4,r) = bara di.nl4jy> ui us,» us| @ lXrijn) OR Apl = 
Kp 2 ® tr(Pa, A,r), applying Step | to the state p’, we can choose the TP-CP map 
kt: SCHg) > S(Ha, ® He’) such that 


dy 
103 1a, & K1(Uilp's, appl) ® |01) (v1L, mix. ar @ ove) > 1—e. (8.315) 
l=1 


We denote the TP-CP map 04, 4,5 b> YS ta, ® Ki ((Uj|o4,4,B17)) ® |vr)(vz| by 
«. Then, (8.314) and (8.315) yield that 


1=1 
<D(K(pPA,ArB)> KO's, AB) a b(K(PA,ArB) Pmix,Ar ® |D) (P|) 
<b(p, p)+ Ja < Jat Je, 


which implies (8.310). a 


dp 
(> ta, @ Ki ((v;|P4,A,B1U1)) ® |7)(W|, Pmix,ar @ ove) 


8.15.2 Duality Between Two Conditional Entropies 
of Mutually Unbiased Basis 


Now, we revisit (7.51) with the mutually unbiased bases {u ej and {uj} ,- In this 


case, we have c = 1/Vd. So, if the outcome of E’ = {|v)(v;|} is almost determined 
by the information in 1/,, i.e., the conditional entropy Ay, ag (p4.,)(A|B) is small, 
the other conditional entropy Ay ,e@ip(p4.7)(A|E) with E = {|u;)(u;|} 1s almost equal 
to the maximum value logd. That is, the information in the basis {u;} is almost 
independent of the information in 71. 

Now, we arise the reverse question: whether the outcome of E’ is almost deter- 
mined by the information in 7/g when the outcome of E is almost independent of 
the information in 7{,¢. Theorem 8.16 gives the solution when the outcome of E is 
almost determined by the information in 7{,. That is, we can show the following 
theorem by using Theorem 8.16. 


Theorem 8.18 Assume that E and E' are the PVMs given by arbitrary two bases 
{uj }9_1 and {vi}“_,, respectively. When Ay. e@ig(p, ¢)(A|E) = log d—e and the €2-bit 
recoverability holds for the state p, we have 


Fiicy@p(pa.n)(AIB) < logd( Je + 4) fer/2)” + h(a + V /e/2)"). (8.316) 


Then, under the €2-bit recoverability, the two conditional entropies Hy,,.@.,(p, .)(AlE) 
and Ay, @15(pa,,)(A|B) satisfy the following equivalent conditions; Hx ,@ip(p,.,)(AlE) 
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is close to the maximal value, log d, if and only if Ay,.,,.@15(p4,,)(A|B) 1s close to zero. 
This relation can be regarded as a kind of duality relation. That is, we obtain the argu- 
ment reverse to (7.51) under the €-bit recoverability for the state p when the two 
bases {u Ws and {ue are mutually unbiased. Although Theorem 8.18 holds with 
arbitrary two bases, the relation (7.51) gives weaker evaluation in the general case. 
Hence, the above equivalence relation cannot be shown from (7.51) and Theorem 
8.18 in the general case. 


Proof Since 


€, = logd — Agpeig(ps.c) (ALE) = D(Ke © te (Pa,£) Il Pmix,A ® Pe) 
=|Kz @ te(Pa,e) — Pmix,a @ peyll- 
>4(1 — F(Ke @ Le (pa,£)s Pmix,A ® PE)))”s 


we have 


1— fe,/2 < F(KE ® te (Pa,e), Pmix,A ® Pe). 


Thus, 


ae ui |Kp ®@ K(pap)|Uj,U;) 


IV 
alo 


(ui UW |Kp @ K(pap)|Uj, U5) 

=F (KE @ lp(ta @ K(paB)), KE’ @ ba (|P)(P|)) 

> F (K(aB), |®)(P|) = 1— Cf + Se1/2)”. 
When X and Y are the random variables subject to the distribution P(X = x, Y = 
x) = (Us, Wy Kp ® K(pan)lux, Wi) = Trg ® ba(Pa.e)|ax) (lx| @ Ku) (aL), 
Fano inequality guarantees that 


Fi p(pa,n) (AIB) s H(X|Y) 


<logd( Jat y Ja/2) + h(/ea + Va/2)”). 


8.16 Examples 


In this section, we summarize the preceding calculation of entanglement measures 
using several examples in the mixed-state case. 
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8.16.1 2x 2 System 


In the case of C? @ C?, Wootters [71] calculated the entanglement of formation as 


i(? ray io") det 
2 


E 5 (p) = , Co(p) = max{0, A; — A2 — A3 — Ag}, 


(8.317) 


where ); is the square root of the eigenvalue of p(S2 © Sz) p(S2 ®@ Sz) in decreasing 
order. The function C,(p) is called concurrence. When we perform an instrument 
{Kw}, with the separable form k,,(p) = (Aw ® B,)p(Aw ® B.,)*, the final state 


(A. @B.,) p(Aw®@B.)* : Exe. 8.75. 
Tr, @B,)p(4,0B,* has the following concurrence [72, 73] : 


( (A, ® By )p(Au ® B..)* )=c (p) let Aull det 8. (8.318) 


Tr(A,, ® B.)p(Au ® B,)* Tr(A,, ® B,)p(Au ® B,)* ; 


; def 
For example, the concurrence of the Bell diagonal state pgeui,p = ya piles? ) 


A,B); : 
(e;” | is calculated as™**’° 


Co(pBell,p) = 2 max p; — 1, (8.319) 


and it does not increase by any stochastic operation [74]***”: 


(8.320) 


(Ay ® Bu) ell, (Ay ® B.)* 
Co (pelt, p) = Co ( = zl ) . 


Tr(A,, ® B.,) PBell, p (Au ® B..)* 


This state satisfies 


—FApgay,, (A|B) = log2 _ H(p), I 


PBell, p 


(A: B) = 2log2 — H(p). 


Further, the maximally correlated state pa» = a|00)(00| + bjOO)(11] + 


b|11)(00| + C1 — a){11)(11|**” has the concurrence 2b™**”. Hence, 


1+ 1 —- 4b 
Regarding distillation, from (8.143) and (8.223) we have™ **° 
By} (ena) = E%2(Pa.) = E,.5(pa,p) = Exper (pap) = —HAp,,,(A|B) 
14+ /(2a — 1)? + 4b? 
=h(a) ( Bi a Bes ) (8.322) 
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for C =>, <, <, S, and PPT. Further, 


i(! dal Oa — vee) 
2 


I,,,(A : B) =2h(a) 


Ci” (Oa) =Ez * (oan) = hla). 
Since Ishizaka [48] proved 74(|r4(p)|) > 0 for the 2 x 2 case, the relation 
ELL (p) = log [Ir (pao) lt = log (1 + 2b) 


holds. Hence, comparing these values, we obtain the inequality 


a) (ene) 
e 2 


> h(a) 


log(1 + 2b) > ( 


for /a(1 — a) > b. In particular, the second equality holds only when ./a(1 — a) = 
b, i.e., the state pgp is pure. 


8.16.2 Werner State 


Next, we consider the Werner state: 


def Pp lp 
= _ er a I F I F 5 8.323 
Pw.p = (1 — P)Pmix + PP mix ad —1‘ + Faap! + F), (8.323) 


where pri, (Piix) 18 the completely mixed state on the symmetric space (antisym- 


metric space). We can easily check that 
2p 2(1 — p) 


1 a 


I,,,(A : B) =2logd + p log wi + (1 ide 
ve d(d —-1) d(d +1) 


—A,, ,(A|B) =logd + plog 


(8.324) 


Further, any pure state |w)(u| on 7, satisfies 


Tp —|u) (ul 
d—-1 


Tp +|u) (ul 


Trg (\u) (u| @ Ip) Pix = q+ 


, Tra (|u) (u| ® TB) mix = 


Thus, 


Ty = |u)(ul Ip + lu) (ul 
Tra (lu) (| @ Ie)pw.p = P—>— — +p) 
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which has entropy we tineah TAT op log (d+)p Haba P) a 2) log “9 aS =p Since 
this entropy is independent of |), 


> 21—p), 201-—p) 
@+Dp+@-DG~p),, gore eS 
+ 
d+1 d?-—1 


Using the symmetry of this state, Vollbrecht and Werner [37] showed that 


h (=45= P) ) ifp> 
E 5 (pw,p) = | a 
if p< 


NIK NIE 


Rains [75] showed 


E,.s(pw.p) = Exper (pw,p) = log2 — h(p). 


Rains [45] and Audenaert et al. [76] proved 


lim " £,,5((pw.p)®™”) = Espp(pw,p) 
n>oo n 

0 ifp< 

= ; log2 —h(:) if>< 

log <* + plog 43 if 5 + 


where d is the dimension of the local system. Note that 5 + 4 = | whend = 2. Hence, 
E,.5(p) does not satisfy the additivity. This also implies limy-, 45 4 E,.ppr (pw, aie = 


Espp(pw,p)- 
Further, Rains [45] also showed that 


d+2 
EV (pw.1) = log 


Since 74(|74(pw,»)|) = OP" [47], we obtain 


2(2p — 1) 
EP (py. p) = logiir*(pw.p Ih = tog (=PP=? 41). 


In particular, 


EM} (pw) = ee (pw.t) = Espp(pw.1) = ESP" (pw) 


— ZFPPT,0oo 
=E, e 


d+2 
(pw.1) = log < log2 = Es (pw). 
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The equality of the inequality log == 4+ < Jog 2 holds only when d = 2. From (8.234) 
the entanglement of distillation of the state pw, Satisfies the additivity 


Ets (pw) + Eqs (p) = E}> (ew, 1@p) 
for any state p. 
On the other hand, Yura [77] calculated the entanglement of cost of any state p in 
the antisymmetric space of the system C? @ C? as 


E.(p) = log 2, (8.325) 


which is equal to its entanglement of formation E ¢(p)***. Hence, in this case, 


5 
EFT (pw.1) = ELE (pw,1) = log as log2 = E.(pw,1). 


Further, Matsumoto and Yura [78] focused on poo = Tr, |x)(x|, where |x) is a 


purification of pw,; with the reference system 7/r, and showed that 


Es ((pwi)®") 
n 


E, (py) = = Es(py}) = log(d — 1). (8.326) 


Hence, using (8.173), we have 


d 
CcA7B @n =] . 
d (Pw. og d=1 
Since py,; and py. satisfy the condition for (8.175), the relation (8.176) holds, i.e., 
E. (pw) = E. * (pw.o) = logd. (8.327) 


The entanglement purification E,(pw,,) of the other cases has been numerically 
calculated by Terhal et al. [40]. 


8.16.3 Isotropic State 


Next, we consider the isotropic state 


é I —|®q){Pal 

Pip = — pP)— 5 + pl®a)(@al (8.328) 
d- p)d? d*p— 

= py Pmix + po loi) (®al, (8.329) 


where |®y) = 7 by u; ® u;. We can easily check that 
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l—p 
— Hp, (A|B) =logd + plog p + (1 — p) log > F 


1— 
T,,, (A: B) =2logd + plog p+ (1 — p) log 7 


Pp 
mR (8.330) 


Further, any pure state |w)(u| on 7, satisfies 


(1 — p)d? dp-l 
Tra |u) («| @ Iap1,p =—3—— Pins + — 
d ve dp+1 
= = Ia) ul) + 


y leita 


a bid d- ine ap dp 


which has entropy 
pendent of |v), 


log > log 77, - Since this entropy is inde- 


(1 — p)d (l—p)d dp+l1 dp+1 
log ] 


Car? ») = logd : 
d (Ow,p) =logd + a4 ge ad 


Further, King [79] showed that 


CF? ((owg)™) = nC; * (pw,7)- 


Define a = Tr, |x)(x|, where |x) is a purification of p;,, with the reference 
system 71. Then, using (8.173), we have 


E;((op®)®) 


E(pp9) =——_*> — = Es(pr) 
(1 — p)d (l—p)d dp+l1 dp+1 
— log log : 
d?—1 d+1 d+1 d+1 


Using the symmetry of this state, Terhal and Vollbrecht [80] showed that 


EF(P1.p) = in, PG) + (1 — yx) log(d — 1)) 
+ (1 — p)hyy)) + A — yy) log(d — 1), 


where we take the minimum with the condition p = px + (1 — p)y, and 


1 
VP) = FVP + V(d— 1) = p))’. 


They also showed the following relation for the d = 3 case and conjectured it in the 
d > 3 case as follows: 
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0 ifp<4 
E;(pr.p) = 4 h(y(p)) + 1 — y(p)) log(d — 1) if 5 < p < *G® 
WA log(d — 1) + logd if “S¥ <p<i. 


Note that the isotropic state is locally unitarily equivalent to the Werner state when 
d=2. 
Further, Rains [75] showed that 


logd — (1 — p)log(d — 1) —h(p) if p = 4 
Es (p1,p) = E,.ppr(p1,p) _ 0 if p 2 t 
Rains [45] also proved 
Ej3 (p1.p) = logd — (1 — p) log(d + 1) — h(p). 
Since 74(|74(p7,p)|) => 08", we obtain 
PPT,0o 7 A _ | logdp if p= 4 
Eee” (Pp) = logit” (pr.p)lhi = | g if p < : (8.331) 


In the system C? ® C?, Terhal and Horodecki [30] proved 
= @2 \) _ 
Esr(7,4,) = log 2, Esr(7 4.) = log 2. 
Since EFT (p;,p) < E*:°(p1,p), from (8.331) and (8.114) we obtain 
Ee; eS log V2 for C =>, <—, =, S, PPT. 
. > /2 


Exercises 


8.75 Prove (8.318) following the steps below. 
(a) Show that A? S)A = S> det A for a2 x 2 matrix A. 
(b) Show that (A @ B)p(A @ B)* (Sp @ S2)(A @ B)p(A @ B)*(S2.@ Sp) = | det A|?| 


det B|?(A @ BYptS, & So)plSy @ Sits tee ) 

® @B)* -_ let et ° 
(c) Show that Col FCA@B ABBY) = THA@B)ABB)) 
(d) Prove (8.318). 


8.76 Prove (8.319) following the steps below. 

(a Show that PBell,p = PBell, p- 

(b) Show that ($2 @ S2)ppen,p(S2 ® S2) = ppettp- 
(c) Prove (8.319). 


8.77 Prove (8.320) following the steps below. _ 
(a) Show that Tr(A @ B)ppen,p(A @ B)* = Ss fe Tr A*AS; B’ BS;. 
(b) Show that 5 Tr A* AS; B" BS; > | det A|| det BI. 
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(c) Prove (8.320). 


8.78 Show that any maximally correlated state can be written as fap fa |00) (00| + 
b|00) (11| + b]11)(00| + C1 — a)| 11) (11]| with two non-negative numbers a and b in 
a2 x 2 system by choosing suitable bases. 


8.79 Show that C,(pa,,) = 2b following the steps below. 

(a) Show that ($2 @ $2)pa5(S2 @ S2) = pi-a.p- 

(b) Show that pg,p(S2 ® S2)Pa.p(S2 ® S2) = (a(1 — a) + b*)|00) (00| + 2ab|00) (11]+ 
2(1 — a)b|11)(00| + (a — a) + b*)|11) (11). 

(c) Show that Cy (pap) = 2d. 


8.80 Show that H(p,.») =h —= A ) 


8.81 Assume that the input state is the maximally entangled state | ®,) (®,4| between 
the channel input system 7(,4 and the reference system 7/r. Show that the output 
state of depolarizing channel kg,, (Example 5.3) (transpose depolarizing channel 
KA. , (Example 5.9)) is equal to the isotropic state (Werner state) as 


(Ka,) ® tr) (Pa) (Pal) = Py 1=a?=1) (8.332) 
(K7,, ® tr) (|Pa) (Pal) = py, o-wepvu-n. (8.333) 


8.82 Show that 74(|7“(pw,»)|) = 0 following the steps below. 
(a) Show that py,» = g/ +r74(|®z)(®a|), where g = 


l—p - 
aD + Cave P= I 
rae 
(b) Show that r“(pw,p) = qU — |®a)(®al) + (¢ +1r)| Pa) (Pal. 

(c) Show that 74(|74(ow,p)|) =qU —4F) + “1F > 0. 


8.83 Show that 74(|7“(p;,,)|) = 0 for p > 4 following the steps below. (This 
inequality is trivial when p < 4 because 74(p;,,) > 0.) 
(a) Show that 7“(p;,,) = ae pl pp — lp py py 4 dem) pe 


d(d2—1) @-1 d(d—1) 
(b) Show that |74(p7,p)| = 4-4 u + F)+ 75/1. 
(c) Show that r4(74(p;,)))) = 2 + dl@a)(®al) + H=ET = 0. 


8.84 Show that (1 — )pmix + AT“ (| a) (ul) = Oif and only if -—7 << zy. 
where Ha = He = C!. 


8.17 Proof of Theorem 8.2 


We prove this theorem in the following steps: O>@>@0=>0, @=>@=> ©. The proof 
given here follows from Bhatia [12]. 
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We first show ©=-@ for dimension d by induction. Let t = (v1 —*1)/O1—y2) = 
(x2 — y2)/(1 — y2) for d = 2. Since x < y, we have 0 < ¢ < 1. Further, the relation 


x, 1l-t ¢ yl 

a earn |e a0 
proves the case for d = 2. In the following proof, assuming that the result holds for 
d <n-— 1, we prove the case for d = n. Any permutation is expressed by a product 
of T transforms. Hence, it is sufficient to show @ when xj > x. >... > x, and 
yy > y2 >... > Wy. Since x < y, we have y, < x; < y;. Choosing an appropriate 
k, we have yp < x1 < yg_-1. When f¢ satisfies x1 = ty; + (1 — t) yx, the relation 
0 < t < 1 holds. Let 7; be the T transform among the first and kth elements defined 
by t. Define 


x 2 (a, ..., an)", (8.335) 


y’ - (2, sey Yk-15 dd ae ty + tyr, Yk+15 sey Vays (8.336) 


Then, Tiy = (x1, y’). Since x’ =< y’ (as shown below), from the assumptions of the 
induction there exist T transforms Ty, ..., T, such that T; - - - Ty y’ = x’. Therefore, 
Ty-+-TTyy = Ty +--+ Ty(x1, y’) = (41, x’) = x, which completes the proof for this 
part. We now show that x’ < y’. Foran integer m satisfying 2 < m < k—1, we have 


Skee > oe (8.337) 
j=2 j=2 
Ifk <m <n, then 
m k-1 m 
y= De + (L—t)y + tye + ~ yj 
j=2 j=2 j=k+l 


m m 


m 
>) - ort - De = Synz Dayne a 
j=l 


which shows that x’ < y’. 

Next, we show @=>©. The product of two double stochastic transition matrices 
A, and A> is also a double stochastic transition matrix A; A>. Since a T transform is 
a double stochastic transition matrix, we obtain ©. 

For proving @=>, it is sufficient to show that 
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for an arbitrary integer k and a set of k arbitrary integers i,, ..., i, from 1 to d. This 
can be shown from the fact that a xi) = kand S, x! < 1 for each j. 
We now show @=>@. For simplicity, we consider d = 2 and let 


(yi /y2)"— (2/21 )(91/y2) (yi /y2) @1/22)-1 
= Qi /y2)?-1 (yi /y2)" = 
B= (x2/x1) Qi /y2)=1 (1/y2)?=Or/y2) 1/22) (8.338) 
O1/y2)°-1 (y1/¥2)°-1 


It can be verified that this is a stochastic transition matrix. Since 


2_ 
(1 /y2) ee x1 = (1 /y2)x1 a (1/92) 
ee Ly (y1/y2)2 aa | 1 


(y1/y2)?-1 

1 /yx)—I 
CT See eee | 1 ) 
aya —rauigere 2) X2 (yi/y2)2 —1 \Ori/y) J? 


we observe that B!ox © B*ox ®y. 
Let 7) be a T transform defined with respect to t between kth and /th elements 
(k <1), and define B! and B? as 


pik pli (ve/y1)? = C1/Xk) OK/ 1) (e/yD Ox /x1)—1 
= Ox/y)?-1 1/21 
b2* p2! (1 /xK) Ke / Y= 1 (ye/y)“ = Ox/ yd) ce / x1) 
(x /yi)?=1 (ve/y)?-1 


Li = O/H ya _ Ox/ XI — %e 
(e/y) — 1” Ox/y) —1 


ifi £k, 1. 


Then, Bl'ox © Beox ® y,if x = Toy. 

Further, if two stochastic transition matrices B,C satisfy y ~ (B/)* o x and 
z & (C')*oy for arbitrary integers i and j, then there exists an appropriate substitution 
s(j) such that 


y x s(j)((B/)* 0 x), 


where we identify the permutation s(j) and the matrix that represents it. Since 
si =O, 
zX(C oy =s{) ((G)(CY) 0 (()"Y) 
cxs(j) (((s(/))1(C')*) 0 (B/)* 0 x) = s(j) ((C's({))* 0 (B/)* 0 x) 
=s(j) ((C's(j))* 0 (B/)* ox) © (C's(j))* 0 (B/)* ox. 


Therefore, 
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Lepr ew =T(Lew) oer 

i,j J i 

= eso =) eo y=) Bo ae: 
j j j 


When we define the matrix D by (D‘/)* = = (C's(j))* o (B/)* (note that the pair 


i, j refers to one column), this matrix is a stochastic transition matrix and satisfies 
(D'/)* ox =z. (8.339) 
Using this and the previous facts, we obtain @>@. 


Finally, we show @=> ©. It is sufficient to show the existence of a d-dimensional 
vector c = (c;) with positive real elements such that 


d k 
GS, - Soe. Oey a (8.340) 


for arbitrary k. For this purpose, we choose k different integers i,, ..., i, such that 
k k 
Yi 
Yada, 
j=l j=l 


For each j, we choose the permutation s(j) and the positive real number d; such 
that (B/)* o x = djs(j)y. Note that )“_, dj = 1. Since 


d 
> b!"' x; = Xi, 
j=! 


d_ ek dk d 
1 =D, LD acon, -=L VT aso, 
Co Pegs 


t=1 j=l t=1 j=l t=1 j=l /=1 


j= 


k 


d k 
SsOuretd, D> sOw=*% 


j=) l=1 j=1 
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we obtain 


k d k 
>) Oise. SD > Oy =e 
t j=l l=1 ft jel 


where we used 5°, d, = 1. These relations show the existence of a vector c = (c;) 
satisfying (8.340). 


8.18 Proof of Theorem 8.3 


Let p be a separable state of 714 ® Hg. We can choose an appropriate set of vectors 
{u;}; in Ha and {v;}; in 7g such that p = D7; |u; ® vj) (uj ® v;| = par Ajlej) (ej, 
where the RHS is the diagonalized form of p. From Lemma A.5 we can take an 
isometric matrix W = (w,,;) such that u; ® vj = a Wi, j/Aje;- Since W*W =] 
we have 


> ut jui @ v; = Jdje;. (8.341) 


i 


Similarly, we diagonalize Trg p such that Trg p = >”, A;| fe) (fel. Then, we can take 
an isometric matrix W’ = (w}_,) such that uj = >), Wi pJ/ Aj fe 
Substituting this into (8.341), we obtain 


Vrje; = DY DY whet Asi @ vj. 
ik 


Taking the norm on both sides, we have 
def 
dj = > Dyers Dj. = bs wer or) . 
k ii! 


If we can show that D;,, is a double stochastic transition matrix, Condition © in 
Theorem 8.2 implies (A;,) < (A;). Since 


(= KW, jj (Wy, kb) Wi',j (vjr |U;) )- (Se Wij Uj! 


Ul 


w.,w; vj )>0 
Sven) 


and WW’ = I, W*W =, we obtain 
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E (Teenie yiesnena )- A 


k i,i 


= Boas 
= > W; Wig = 1. 
i 


We may similarly show that >° j Dj, = 1. Hence, D;,, is a double stochastic tran- 
sition matrix. 


8.19 Proof of Theorem 8.8 for Mixed States 


We show the < part of (8.101) for a general state p. Let {E4; ® Ez,i}; be the 
Choi-Kraus representation of an S-TP-CP map x. Then, 


K(|®z)(®z|) = >> (E4,; ® Epi) |®x)(®z| (Ea; ® Epi) : 


i 


Now, choose y; such that 


P; 2 Tr (Ea,; ® Epi) |®z)(®z| (Eas @ Epi) 
Pilyi) (vil = (Ea; ® Ex,i) |®x)(Px| (Ea; ® Epi) - 


From Corollary 8.1 there exists a probabilistic decomposition {(p;, x;)} of p such 
that 


F(«(|®z)(®z|), p) = X Pi Pi (xilyi)I- 


Since the Schmidt rank of y; is at most L, 


[(xilyi)| < PGi, LD). (8.342) 


From the Schwarz inequality, 


F(K(|®1)(®x)), p=) S >) J Pipi VP Qi. L) 
43 r [Poon b) = [Baron (8.343) 


Thus, we obtain the < part of (8.101). 
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Conversely, if there exists a vector y; with a Schmidt rank of at most L that satisfies 
the equality in (8.342) and 


/ 


je piP(xi, L) 
: pay ate oe 


the equality in (8.343) holds. Therefore, according to Theorem 8.4, there exists a 
one-way LOCC satisfying the RHS of (8.99). 


8.20 Proof of Theorem 8.9 for Mixed States 


8.20.1 Proof of Direct Part 


The second equality in (8.108) holds according to (8.98) and Lemma A.1. We there- 
fore show the < part of the first equality. Let us first show that 


> pili) (il -| 


(Pi Xi) 


min | >) pid — Pi, fe"*)) 


converges to zero exponentially for R > E;(p). The convergence of this expression 
to zero is equivalent to that of the value inside of J on the RHS of (8.101) to 
one. Hence, we consider the latter quantity, i.e., the value inside of //. Choose 


a decomposition {(p;, x;)} such that R > >; Di E (\x;) (x; |). Let p; = Trp |x;) (xi). 
From (8.102), 


lms 
> iP Oi, [e*]) < Spe 


log Tr(p:@p;)'*—2sR __ log Trp) *—sR log Tr py —sR 


fl 
1-s l-s 1l-s 


In particular, since , we obtain 


log Tr(pl, )!~$ —snR 


Dd pn Poh le") <> phe 


log Trp! sr \" 
-(> pie ) ' (8.344) 


def def . 
where we define xj, = xj, @ ++: @Xi,, Pin = Pi, ®--- @ pj, with respect to 


sBidef.ts: ; ‘ : ‘ : ba eae 
ns (i1,...,i,), and p” is the independent and identical distribution of p. Further, 
we obtain 
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log Tr p!~S—sR d log Tr p}~S —sR 
in oD al *) =]. ioe( pie ) 


=>) pi(H(p;) — R) < 0. 


Note that the inside of the logarithmic on the left-hand side (LHS) of the above 


equation is equal to | when s = 0. Taking an appropriate 1 > so > O, we have 
I-so_, 
log Tr p; —sgR 


log (= pie 9 ) < 0. Thus, the RHS of (8.344) exponentially converges to 
zero. Therefore, we obtain E~ (p) < E /(p). Similarly, E> (p®*) < E ¢(p®*). 


Next, we choose a sequence {m,} such that (m, — 1)k <n < m,k with respect 
to n. Denote the partial trace of (H4 @ Hg)®”"*—” by C,. Then, 


F(p2™*, Kim (Pin, ) (Pz, |= ze ko" Ch OKim, ( ®,,,, ) ae I)) (8.345) 
for K,, L. Therefore, if the LHS of (8.345) converges to zero, then the RHS also 
converges to zero. Since 


1 — 
lim — | Os bm = — lim 
: k moo 


C,, is a local quantum operation, and E> (p®*) < E f (p®*), and we have 


Ee (p®) _ Ex(p®) 


ET < 
< (pS k = 2 


Considering inf;,, we obtain the < part of (8.108). 


8.20.2 Proof of Converse Part 


Let us first consider the following lemma as a preparation. 


Lemma 8.23 Let p be a probability distribution p = {p;}4_,. Then, 


d 
> ess H(p) — log L — log2 


Pi = “Tog(d—L)—logh | 


(8.346) 


Proof By defining the double stochastic transition matrix A = (q;,;) 


0 otherwise, 
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ae. L) if i 
the image Ap satisfies (Ap); = } 1—~p%p,1) ee . From Condition @ in Theorem 
aE 
8.2 we have Ap < p. Therefore, 
H(p) = H(Ap) = —P(p, L) log = (1 = Pp, L)) log a ol me 


Since the binary entropy (x) is less than log 2, we have 


P°(p, L)(log(d — L) — log L) + log2 
>=P°(p, L)(log(d — L) — logL) + h(P(p, L)) = H(p) — log L. 


We thus obtain (8.346). a 


We now show the > part of Theorem 8.9 by using Lemma 8.23. Consider the sequence 
of S-TP-CP maps {k,} and the sequence of maximally entangled states {|®,;, )} 
satisfying 

F(Kn(\®z,) (Pz, |), p°") > 1. (8.347) 


Combining (8.100) and (8.101) in Theorem 8.8 and Lemma 8.23, we have 


al =e | 


> pilxi) (xl -| 


1 — F?(kn(I®z,)(®z, I), p®”) 


> min para (x;, Ln) 


(pi Xi) 


E(|x;) (x;|) — log L, — log2 
> min > ; (|xi) (xi |) — log og 
Pim) log(d" — Ly) — log Ln 
_ Es(p®") — log Ly — log2 _ Ese") log Ln log? 
~ log(d” — Ly») —logL, log(d? In) 7 fogs 


Using (8.347) and Lemma A.1, we obtain 


et = 7.5 Ioet: 
0 = jim, (8 ) _ log 


n n 


) (1 = (Foen(1®1,(42,),0°")") 


Hie) Wesla. lard Exp?" iced. 
> Tim ( le me )- fi PA a PE Ee 
noo n 


no 


n n n n 


Thus, we obtain 


E,(p®" (eke. low, 
fi iy OP ie Es ee, 
n n 


n> oo n 


which completes the Proof of Theorem 8.9. 
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8.21 Historical Note 


8.21.1 Entanglement Distillation 


The study of conversion among entangled states in an asymptotic setting was initi- 
ated by Bennett et al. [22]. These researchers derived the direct and converse parts 
of Theorem 8.6 in the pure-state case (Exercise 8.33). After this research, Lo and 
Popescu [2] considered convertibility among two pure states with LOCC. They found 
that the two-way LOCC could be simulated by the one-way LOCC (Theorem 8.1) in 
the finite regime when the initial state was pure. They also obtained the optimal value 
of the probability that we will succeed in converting a given pure partially entan- 
gled state into a desired maximally entangled state by LOCC (8.71). Following this 
research, Nielsen [15] completely characterized the LOCC convertibility between 
two pure states by use of majorization (pure-state case of Theorem 8.4). Vidal [16] 
extended this result to the mixed-state case, i.e., he showed the mixed-state case of 
Theorem 8.4. Using Nielsen’s condition, Morikoshi and Koashi [26] proved that the 
optimal deterministic distillation with an initial pure state can be realized only by 
two-pair collective manipulations in each step. Applying the method of type to the 
optimal failure probability (the optimal successful probability) for distillation with 
an initial pure state, Hayashi et al. [25] derived the optimal generation rate with 
an exponential constraint for the failure probability (for the successful probabil- 
ity). They also treated this problem with the fidelity criterion. Further, Hayashi [21] 
extended this result to the non-i.i.d. case. Regarding the mixed-state case, Bennett 
et al. [28] discussed the relation between distillation and quantum error correction, 
which will be mentioned in Sect. 9.6. They derived several characterizations of the 
two-way LOCC distillation as well as of the one-way LOCC distillation. They also 
conjectured the Hashing inequality (8.121). Rains [75] showed this inequality in the 
maximally correlated case and the relation (8.143). Horodecki et al. [81] showed 
that (8.122) holds if this inequality holds. They also initiated a unified approach, 
which has been established by Donald et al. [32] as Theorem 8.10. Modifying the 
discussion by Devetak [82], Devetak and Winter [67] proved the inequality for any 
mixed state. 

For the converse part, Bennett et al. [22] proved the converse part of the pure-state 
case by constructing the dilution protocol attaining the entropy rate. Then, proposing 
the entanglement of relative entropy E,.(~), Vedral and Plenio [1] proved the inequal- 
ity E HAC) < E,,(p). In this book, its improved version (Theorem 8.7) is derived by 
combining their idea and the strong converse of quantum Stein’s lemma. Then, we 
obtain the strong converse inequality jie (p) < E,(p) even for the mixed case. 
Horodecki et al. [81] obtained the first inequality in (8.120). Further, establishing a 
unified approach, Donald et al. [32] simplified its proof. 

Christandl and Winter [34] introduced squashed entanglement and proved the 
inequality E7(p) < Esq (p). Concerning PPT operations, Rains [45] proved inequal- 
ity (8.223). This book extended this result to the strong converse inequality (8.226). 
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8.21.2 Entanglement Dilution and Related Topics 


Regarding the dilution, as mentioned above, Bennett et al. [22] proved Theorem 8.9 
in the pure-state case. Bennett et al. [28] introduced the entanglement formation. 
Following these results, Hayden et al. [29] proved Theorem 8.9. In this book, we 
proved Theorem 8.9 in a little different way to that given in [29]. In Sect. 8.20.2, we 
rigorously optimized the fidelity with the finite regime and proved Theorem 8.9 by 
taking its limit. 

Further, Lo and Popescu [39] showed that the bound can be attained by classical 
communication with the square root of n bits in the pure-state case. Further, Hayden 
and Winter [83] and Harrow and Lo [84] proved the optimality of Lo and Popescu’s 
protocol. Using their results, Terhal et al. [40] showed that the optimal rate of dilution 
with zero-rate communication can be characterized by the entanglement of purifi- 
cation. They also showed that it is lower bounded by the quantity C/~8(p), which 
was introduced by Henderson and Vedral [41]. As a problem related to dilution with 
zero-rate communication, we may consider the problem generating a given separable 
state from common randomness. This problem with the classical setting has been 
solved by Wyner [85]. Theorem 8.13 is its quantum extension. 

For entanglement of exact cost for PPT operations, Audenaert et al. [47] derived its 
lower bound. Concerning entanglement of exact cost for LOCC operations, Terhal 
and Horodecki [30] focused on the Schmidt rank and calculated it for the two- 
tensor product of the two-dimensional isotropic state. Joining these, we derived the 
entanglement of exact cost for these settings in this example. 

As arelated problem, we often consider how to characterize a pure entangled state 
producing a given state with nonzero probability by LOCC. This problem is called 
stochastic convertibility. Owari et al. [86] treated this problem in infinite-dimensional 
systems using the partial order. Miyake [87] treated this problem in tripartite systems 
using a hyperdeterminant. Ishizaka [88] focused on PPT operations and showed that 
any pure entangled state can be stochastically converted from another pure entangled 
state by PPT operations. 

For the discord D(B|A),, we can easily show Inequality (8.178). The equality 
condition (Lemma 8.14) was given in the first edition of this book. However, at that 
time, the proof was not perfect. Then, Datta [89] and Dakic et al. [90] showed this 
argument latter. The Proof of Lemma 8.14 has been given by filling in the gap in the 
first edition of this book. 


8.21.3 Additivity 


Many researchers [31, 36] conjectured additivity of entanglement formation, i.e., the 
equation (8.144) holds for arbitrary two bipartite states. This relation can be general- 
ized to the superadditivity of entanglement formation [37] as (8.145). Shimono [91] 
showed this conjecture when the states are in the antisymmetric space of the system 
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C3 @ C3, and Yura [77] extended it as (8.325). Matsumoto and Yura [78] extended 
this result to a more general case as (8.326). Then, Shimono et al. [92] numeri- 
cally checked that there is no counter example for superadditivity of entanglement 
formation. However, Hastings [61] showed the existence of a counter example for 
superadditivity of entanglement formation. So, we find that the dimension of Hilbert 
spaces discussed in the numerical demonstration by [92] is not enough high. That is, 
the counter example requires higher dimensions. In this book, we discuss a counter 
example based on Fukuda [60]. Fukuda [60] employed the large deviation on the 
sphere with the Haar measure, which is summarized in Sect. 2.6. Using Theorem 
2.11 in Sect.2.6, we show Lemma 8.19, which plays an important role in this counter 
example. In fact, a lemma similar to Lemma 8.19 is often employed in quantum infor- 
mation theory. However, they sometimes drop an important factor in such a lemma. 
We need to be careful to use this type lemma. 


8.21.4 Security and Related Topics 


Theorem 8.15 plays an important role in the security evaluation. When s = 1, we 
can replace v by 1 in (8.291) of Theorem 8.15. This argument is called Left over 
hashing lemma. Its classical version has been shown by Bennett et al. [93] and 
Hastad et al. [94]. Renner [68] extended it to the quantum case when the security 
criterion is given by d|(A : E|p). However, to derive an exponential upper bound 
for the security criterion like (8.293), we need Theorem 8.15. Its classical version 
was shown by Hayashi [69] and the quantum version was shown by Hayashi [65]. 
Recently, the tightness of the exponential evaluation was shown in the classical case 
by Hayashi et al [95]. 

In fact, there is a duality relation between the security and the coherence. To 
clarify this relation, Renes [70] showed Theorem 8.16. This kind of relation can be 
used for showing the performance of the code for the quantum-state transmission 
and the Hashing inequality (8.121) for entanglement distillation. For this purpose, 
we need to relax the condition of Theorem 8.16. So, we derive Theorem 8.17 as a 
generalization of Theorem 8.16. That is, Theorem 8.17 has a wider applicability than 
Theorem 8.16. Then, Theorem 8.17 will be employed in Sect. 9.6. Further, using this 
idea, we can derive the opposite inequality to the entropic uncertainty relation (7.51) 
as Theorem 8.18. 


8.22 Solutions of Exercises 


Exercise 8.1 Diagonalize p as >°; pi|u)(uA|. Then, the purification is given as 
Ix) = >, /pilu#, uP). So, we have Tr, |x)(x| = >, pilu?)(u?|, which implies 
that H(p) = A (Tra |x) (x|). 
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Exercise 8.2 Choose the purifications |x) and |y) of p and o satisfying (8.10), 
ie, 1- (Trl VpJol)” = 1 — |(x|y)|?. Exercise 3.18 implies that 1 — |(x|y)|? = 
d?(\x) (x|, |y)(y|). The monotonicity of d, for the partial trace implies that d? (|x) (x|, 
Iyy(yp = d; (p, o). The combination of these relations yields the desired inequality. 


Exercise 8.3 Let u and v be a of p and o such that F(p,0) = F(|{u)( 
ul,|v)(v]) = Trv|u)(ul/|v) (v \(ulv)|?7 = F?(\u)(u|, |v) (v|). Using the 
monotonicity of ¢(1/2, p, 7), we a F?(p,0) < Tr Jpvo. 

In addition, F?(p, 0) = Tr | /p/o| = Tr /pvo. 
Exercise 8.4 F?(p, kK) = (x|K @ br(|x) (x|)|x) = 0; (x|E; @ Tx) (x|E; ® Ix) = 
>) | Tre Evel’. 


Exercise 8.5 


(a) Consider the singular value decomposition of the matrix {Tr E;Aj;p}j,;. 

That is, choose unitaries U = (u;,;) and V = (v;,;) such that >", i,j Ui! Tr Ey Aj pvj,j 
is a diagonal matrix with positive and real diagonal elemenis. Then, we define 
Ey = >, uj Ey and Aj = >; Ejv;,; Aj. Due to Exercise 5.4, {E/}; and {Aj}; 
are Choi—Kraus representations of « and &’, respectively. 

(b) We retake Choi—Kraus representations {£;}; and {Aj}; of « and «’ based on (a). 


Hence, {£; A ;};,; is a Choi—Kraus representation of « 0 &’. Define p; = Tr A; pA; 


def ‘ 
and A’ = A;/,/pj- (8.19) yields that F?(p,K 0K’) = Dig | Tr B:Ayel? = 


> |Tr E:Aipl? = >; pil Tr E; A‘ p|?. Now, we choose the largest | Tr Ey Al pl? 
one among {| Tr E; A’p|*};. Hence, we obtain F?(p, Ko &’) = >, pi| Tr E; Aj pl? < 
| Tr E;,A4, pl?. 

(c) Note that EA = (U|E|'/*)(\E|!/".A). Apply the Schwarz inequality for the inner 
product Tr X*Y p. Then, | Tr EAp|* = | Tr(U|E|'/7)((E|!2A)p|? < Tr U|E|U* p Tr 
A*|E|Ap. Since Tr A*|E|Ap = Tr |E|ApA* < Tr ApA* = 1, wehave | Tr EAp|* < 
Tr U|E|U*p = Tr EU“p. 

(d) First, note that F?(p, KoKy) > |Tr E\U* pl’. We choose a Choi—Kraus rep- 
resentation {E;}; of & and a matrix A according to (b). Take U* to be an isome- 
try from 7{,4 to 7(g under the polar decomposition E; = U|£;|. (c) implies that 
F?(p,K0K') <|Tr Ey Ap)? < TrU|E,|U*p = Tr E\U*p < F.(p, Ko Ky). 

(e) In this case, we can take the polar decomposition E; = U|E}\| such that U is an 
isometry from 71, to 7H1,4. Hence, (c) implies that 


F?(p,KoK!) <|Tr Ey Ap|? < TrU|E,|U*p 
=Tr PcU|E,|U* Pep = Tr Ey U* PcpPec 


PcpPc 
r Pcp 


=(Tr Pcp) Tr EU" -— < (Tr Pep) Fes Kio Ree). 


Exercise 8.6 We have | Tr(A; + Azi)p|? = | Tr Ap + Tr Azpil? = (Tr Ap)? + 
(Tr Aop)’, where A; and A, are Hermitian matrices. Hence, the function p tb 
| Tr Ap|? is a convex function. Using (8.19) and a Choi—Kraus representation {A ;}; 
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of k, we have 
F2(p, 8) = DOI Tr Ajpl’ < >) pi DUI TH Ajpil” = >) iF? (pi, #)- 
vi i J i 


Exercise 8.7 Markov’s inequality implies a — are dc elements x; among 
cl yoe such that (1 — F?(x;, K(x;))) < ee la (1 — F?(x;, K(x;))). Since 
Geka) == Fas, s(tr41))), we have 


(1 — F?(Xd,—de+1s K(Xdy—de+1))) 


a | ; 7 ; 
Ege ds < (1 = F°(a;, KO) = max {1 — F?(x, K(x))}, 


where the final equation follows from the construction of x;. 
Exercise 8.8 


(a) Choose a purification |x) = >°; ./pilui, u uk) ), where uf isaCONS of the reference 
system 7/r. Hence, (8.19) implies that 


F2(p, 6) = (x|& ® ee (x) (xI) lx) = 2 | ® ee (Ix) (xI)/Pylui. uF) 


= pips (uilm(ui)(ujDluj)- 
ij 
(b) We denote the expectation under the uniform distribution with respect to ¢ = 


(d1,..., 0a) by E. ie average E 1 an am CL Pi 497-9" +97") ig nonzero only when 
Ms gs y 


jf id 
j=] rad! = j” or j = j” and j’ = j””. Hence, 


EF*(u(@), K(u())) 
=E >, pipet ert Or Pi $97 (ue [Cu jr) (jr |) [ue jr) 


Td 


VEY ee 
SE >) pipj(ujle(\uj)ujlu jr) +E >) pipjujle (uj) (ujl)|uj) 
jj" jj’ 
=F2(p, 8) + >) Dj Pe (uel (\uj)(u jl) lux). 
jk 
(c) We have 


d d 
SY pe (aa les(lur) (url) ue) S pa (Xiao) steno 


k= k= 
=p2 Tr — |uy) (ui |)K(\u1) (ui |) < pod, 
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and 
yo (ug ee(\uej) (uj |)|ue) = Sp ux | «(| j) (ue j|)|ue) 
j=2 kAj j=2 0 kAj 
d 
=Xpmn SS lua) (ued Ku jug] | <2 ppp Ted = |uj) (uj Cuy) (ul) 
k#j j=2 


ont 
j=2 


(d) We show that (1+ p2— p1)pi < 5 as follows. When p; < 1/2, max,,(1+ p2— 
Pi)P1 = pi < 1/2. When p; > 1/2, max,,(1 + po — pi)pi = 20. — pi)pr = 1/2. 
Using (b) and (c), we have 


1— F2(p,6) + >> pipe 
izk 
= (ux K(|uj) (uj|) luk) + EC — F?(u(¢), K(u(@)))) 


d d 
<Pi >) Pe (uel (lu) (url) + >) D2 pj Pe (el e(uj) (ue jl) lux) 


k=2 jor kAj 
+ E(1 — F?(u(¢), K(u()))) 
d 
<pipod + >> pjpid + Ed — F?(u(9), n(u($)))) 
j=2 


d 


< Pipo+ > pipitl 6 
j=2 


3 
<(1+ p-popit es 5° 


Exercise 8.9 Consider the depolarizing channel K2,,. (5.12) implies that 


1 1 
nana — F?(x, k2,,(x))} = 1 (4 5( ») = 5M ») 


We also have 


3A4+1 311-2) 
4 4 ° 


(1 — F2(p, k2,,)) = 


Hence, we obtain the equality in (8.26). 
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Exercise 8.10 


(a) (5.12) implies that 


LA) = Code od 


(1= FE (Omics a,4)) = 1 ( +a 7p 


_a — A)(d + Id - 1) 


d2 
and 
1—A 1-A)d-1 
E,,x [1 — F(x, ka,,(x))],= 1 (4 ) - ( )G y 
d d 
Thus, 
d 4 : 
Tay (hm Fe (Pmixs a2) = Eyx [1 F°@ a,x) 


(b) For any element y € 71,4, we have 


Epa Fw) = f (y|U*K(ULy) (y|UU|y) (dU) 
SU(da) 
=F*(y, Ka,)(y)). 
(c) Let |z) be a purification of pmix. Since UU? = 1, (8.18) implies that 
F?(pmix, Kd,a) = (zlKa,, ® tr (1z)(zI) Iz) 
= (z|U* @ Ink @ tr(U ® Ip|y)(y|U* @ Ir)U ® Ir|y)v(du) 
SU(da) 
=| (zl, @ Uk @ ign @ UT ly)y|La @ W)L4 @ UT |yyv(dU) 
SU(d,) 


= (z|K @ er(ly)(yDly)u(du) 
SU(da) 
=(z|K ® er(Iz)(z))|z) = F2(pmix, &). 


(d) Using (a), (b), and (c), we have 


d d 
rem (1 = aie k)) = ai (1 _ Gas Ka,d)) 


=E,,,x [1 — F(x, wa,,(a))] = Ey,x [1 — F?(x, x(x))]. 


Exercise 8.11 Let |x) is a purification of p. Then, 
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F? (> > se) = (x| 2 fis VeeDIx) 


= DSi (x |i (lx) (x) |x) = EAs 


Exercise 8.12 


(a) Since X is a full rank matrix, p; = aX M/ X* is pure if and only if M/ is 
rank-one. 

(b) Assume that the states p; in (8.31) are orthogonal to each other. Then, X MIX *X 
Mj; X* = 0, which is equivalent with Mj X*X M7 = 0, ie, M;(X*X)" 
M; = 0. This condition holds if and only if POVM M = {M;} is a PVM and 
commutative with (X*X)". We can also show the converse argument. 


Exercise 8.13 Denote the input and output systems of « by 714 and Hz , respectively. 
Let |x)(x| be a purification of pmix. Choose the probalistic decomposition as kK ® 
er(|x)(x|) = >; pilyi)(yil. Then, the Schmidt rank of |y;) is less than d’. That is, 
the rank of Trg |y;)(y;| is less than d’ Therefore, Trp /Tra |yi) (vil < Vd’. Thus, 


(x|(k! 0k) @ on (lx) (x1) = (xl DS) pik! ® ve )(Iyi) (YD) 


< >> piF?(Tra(n’ ® ce) (yi) (vil), Tra bx) (x1) 


i 


=>) piF? (Tra lyi)(yil, Pmix,k) = cou |/‘Trs Lyi) (vil y/Pmix.el)” 


=> i ev Lyi) Vi I)? 


i 


=a" 


Exercise 8.14 When p has a purification |x) with the given form, Trp |x)(x| = 
>, pilxf)(xf| @ |x2) (x2. Conversely, we assume that p is a separable state with 
the form >°, pi|xA) (xA| ® |x?) (x? |. The above given state |.) is a purification of p. 


Exercise 8.15 Since x’ is a pure state on Hy ® He ® He, Hy (A’R) = Hy (E') and 
Ay (R) = Hy, (A'E’). 

Exercise 8.16 Since the final state on Hr ® He ® Hz is a pure state, H(p) is equal 
to the entropy of the final state on the reference system Hr. H(K(p)) is equal to the 
entropy of the final state on Hr ® He. A. (p, &) is therefore equal to the entropy of 


the final state on the environment 7/¢, respectively. Then, we denote the final state 
on Hr ® He by ogg. Thus, 


A(p) — I.(p, &) = H(p) — (A (K(p)) — Ae(p, &)) 
=H(or) — Hore) + A(z) =1,(R: E) = 0. 
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Exercise 8.17 Let |x) be the purification of p. Use the second inequality in (3.48) 
and the monotonicity of the trace norm concerning the partial trace on the reference 
system. Then, 


lp — K(p) Ili S IK @ ex) (x) — |x) (all < V¥20. — Fe(p, &)) S Te. 
Thus, Fannes inequality (5.92) yields that 


A (p) —1,(p, k) = H(p) — H(K(p)) + A(K @ U(|x)(x|)) 
<|H(p) — A(K(p))| + |H(k @ U(x) (x1) — O| 
<llp — &(p) 11 dog d — log ||p — K(p)|l1) 
+ || ® (lx) (x1) — |x) (x[ I] dog d? — log |] ® v(|x) (x1) — |x) (alll) 


<,/2(1 — Fe(p, K)) (3 logd — 2log /2(1 — F.(p, *))) 


Exercise 8.18 Let « and x’ be the TP-CP maps from H, to 7g and from 71g to Hc. 
Let Hr, He, and 1, be the reference system of 71, and the environment systems of 
« and x’. Then, we denote the output state of « on the whole system Hr ® He @ Hp 
by p’, and denote the output state of «’o« on the whole system Hr @He @ HE @Hc 
by p”. Since the strong subadditivity (5.83) of the von Neumann entropy implies that 
Hy (C) — Hy (RC) < Hy (E'C) — Hy (RE'C), we have 


I.(p, K) = Hy(B) — Hy(E) = Hy(E'C) — Hy(E) 
=H(E'C) _ H,y(RE'C) > Ay (C) — Ap (RC) 
=H, (C) — Hy (EE’) = I,(p, kok). 


Exercise 8.19 
(a) 


I(p, K) = H(K(p)) + H(p) — H(K ® tr (|x) (x])) 
=D(k @ tr (|x) (x|)[|6(o) ® Tra |x) (x). 


(b) 


I(p, 0K) = D((K' 0K) @ br (IX) (x1) II(K" 0 &)(p) ® Tra |x) (x1) 
SD(K ® tr(\x) (x) IlK(p) ® Tra |x) (x]) = 1 (p, 6). 


Exercise 8.20 Let « be a TP-CP map from H, to Hg. Let He and 7H, be the 
reference system of 71,4 and the environment systems of «, respectively. Let |x) be 
a purification of p with the reference system 7/r. We also denote the unitary from 
Ha to Hg ® Hz as the Stinespring representation of « by U. Since >; piki = k, 
we have 
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I.(p, 6) = H (Trp k ® tr(|x)(x|)) — H(K ® er (|x) (x1) 
= — Hxese(ix) xp (RIB) < — DS Pi He erecayey (BIR) = >) pile(p, Ki). 


L 


Hence, 


I(p, K) = H(p) + Ielp, k) < H(p) + >) vile(p, Ki) = >) pil (p, Ki). 


L L 


Exercise 8.21 Let «’ and « be the TP-CP maps from H, to Hg and from 7. to 
Hc, respectively. Let Hr, He, and He be the reference system of 74 and the 
environment systems of « and x’, respectively. Let |x) be a purification of p with 
the reference system 7p. We also denote the unitary from H,4 to Hg ® Hz as the 
Stinespring representation of «’ by U. 

The monotonicity with respect to the partial trace with 7{¢ implies that 


I(K'(p), K). = D(kK ® tre (U|x)(x|U*)||K(p) ® Trg U*|x)(x|U*) 
> Dk @ tr(U|x) (x|U*)||K(p) ® Trae U*|x)(x|U*) 
=D((K 0k’) @ r([x) (x) (Ko &’)(p) @ Tra |x) (x1) 
=I(p, Kok’). 


Exercise 8.22 Let |x) be a purification of p. Then, U|x) is a purification of U pU*. 


I.(p, 0 Ky) = H(Ko Ky (p)) — H(K 0 Ky @ br(x) (x1) 
=H(K(UpU")) — H(K ® tr(U|x)(x|U")) = Ie(UpU*, ki). 


Exercise 8.23 
(c) We denote the density |x’) (x’| by p’. Then, 1, (A’E’, : BER) = DO ver we’, | 
Owe’, ® Pere!) > De Er o'er ® Pr) = 1, (E', : E’,). Hence, (8.34) implies that 


(an) +10", 6") — 1p, n° BK") 

=H, (A) + Hy (B’) — Hy (A'B) — (Hy (E4) + Ay (Eg) — He (EE) 
+ (Hy (A'E),) + Hy (B'E) — Hy (A'E', B’ Ey) 

=1,(A’: B') — Ip (Es Eg) + [pe (A'E), : BE) 

>1y(A’: B') > 0. 


Exercise 8.24 


(a) 
Tre,e [x)(x] = >) pj Tee lxjxjl = >) Pjp;- 
Fi j 
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(b) Since Tr4(k ® KE ® tr) (\x)(x|) = (Ke ® te) Tra (|x) (x]) and Trrp(k @ ke ® 
tr)(Ix)(x|) = K(p), we have 


D((KE ® ba,r)(K ® tre d(x) (x) Ke ® ta.r)(K(p) ® Tra (|x) (x]))) 
=D((K ® Ke @ tr)(Ix)(x)I|K(p) ® (KE ® te) Tra (|x) (x])) 
=H(K(p)) + H((KE @ tr) Tra(|x)(x|)) — H((K ® KE ® tr) (|x) (x|)) 


=H(K(p)) + >. pj(H(p;) — log pj) — >, pj ((k ® tx) (la) (x1) — log pj) 
j j 


=H (K(p)) + >) pj H (pj) — >, pj H (Kr @ te) (Ixj) (xj). 
j j 


(c) 
1.(>. pip; 8) + HW) 
J 
=D((K B tex )(1x)(xI)IN(&(p) @ Tra (Ix) (x1))) 


>D((Ke @ tar)(K@ tre (xD KEe @ 4,~)(K(p) ® Tra(x) (x1))) 
=H (K(p)) + >> pjH(p)) — >, pj HK @ tex) (xj) 
j j 


=>) pile(p;, K) Te A(kK(p)), 
j 


which implies (8.50). 
Exercise 8.25 


(a) 


k 
> Piteloj. ©) = >) Pj He) — >. Pj) H(K@ ea )(lxj) (xj) 
j=l j J 


= > pj(H(p;) — log pj) — >< pj(H (Kk ® te) (Ix) (x)1)) — log pj) 
j J 
=H (ke (Tra(|x) (x) — A (Ke ® bark ® lr,e)(Ix)(x1)). 


(b) Inequality (5.80) implies that H (Tr, |x) (x|) < H(Ke (Trg |x) (x|)), which yields 
the last inequality in (8.56). 

(c) Inequality (5.81) implies that 7 (4 ¢ (Tra (|x) (x|))) —H (KE @la,r)(K@ LR, R) (|x) 
(x|)) < logk. Hence, (8.56) implies (8.51). 


Exercise 8.26 Since (5.77) implies that H(K(-\_, pipi)) => Si, piH(K(p))), 
(8.50) guarantees that 
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HE noe) = 1S nave) +m(s(Snn)) 


k k k 
>> pile(pi. 8) + >. piH (K(p))) = Do pil (pi. ki). 
al 


i=1 i=1 


Exercise 8.27 Since (5.79) guarantees that H(K( Dipi)) < yj Di (K (pi)))+ 
log k, (8.51) implies that 


Spr) =i Smma)+4(o(E ee) 


k k 
<> pilelpi. 6) + >- pi H(K(p;)) + 2 log k 


i=l i=1 
k 


=>) pil (pi, 6) + 2logk. 


i=l 


Exercise 8.28 Let « bea pinching of aPVM {|u;) (u; eS satisfying that |uw1)(u1| = 
|w) (u|. For any permutation g on {2, ..., d}, we define the unitary U, := > |U g(i)) 
(u;|. Assume that (u|p|u) = f. Then, we have 


(@) @ i 
H(p) <H(ke(p)) S a(> @op! 


Levetou) 
=h(f) + d— fylog(d — 1), 


where (a) and (b) follow from (5.80) and the concavity of the entropy, respectively. 
Hence, we have (8.57). 

Applying (8.57) to the case when |u) is the purification |x)(x| of p and p is 
K ® Lr(|x)(x|), we obtain (8.53). 


Exercise 8.29 Consider the unitary matrix 


Si 


co 


/ Pol * * * 
J/pil * * * 
af Prt * * * 
3 /p3l * * * 


0 
S 
0 
0 


0 
0 
S 


ooo 


ooo 
iy 


0 


nA 


as a Stinespring representation in C? @ C+, where the elements * of the second matrix 
are chosen appropriately to preserve the unitarity. Then, the channel «” given in (5.7) 
is given as 
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K(p)i,j => AV Pi Pj Tr Si pS}. 


Since K” (Pmix)i,j = OL es we have H,(k, Pmix) = H(K (Pmix)) = H(p). 

Pot p3 O 2 _(potps 0 ) 
0 np) ands (etter = ( 0 pitp 
we have H.(k, |éo)(eol) = H(K* (eo) (eol)) = h(po + p3) and He(K, |e1)(e1|) = 

K* (\e1)(e1|) = h(po + ps). 


Exercise 8.30 The map p = (p;) (uA p;|uj)(uj)|v;)) 1s double stochas- 
tic. Further, we have g; = (vile); pj\uj)(uj|)|v;). Hence, we have g ~ p. 


Since K# (|eo) (e9|) = 


Exercise 8.31 Let pmix be the uniform distribution. (A;) < Pmix if and only if the size 
of the uniform distribution pmix is less than 1/ Me Then, Theorem 8.4 with p; = 61,; 
guarantees the desired argument. 


Exercise 8.32 This follows immediately from Exercise 8.31. 


Exercise 8.33 The state ,/M/\" Trg |u) (u|®",/ M;*" is a completely mixed state 


whose support is that of Ve, The measurement outcome on A is the same as 
that on B. So, the resultant state with the measurement outcome q is a maximally 
entangled state with the size |77"|. 

For an arbitrary real number € > 0, due to Theorem 2.6, the probability of the case 
when H(qg) > H(p,) — € goes to zero exponentially. So, we have ES | (\u) (ul) > 
H(p,) — €. Since € is arbitrary, we obtain the desired argument. 


Exercise 8.34 For any separable state 0, we have 


D(ppell, p lo) 
= TD prado Macy log DP) Nui) or — logo 


“Sst (uj: \(log(\u‘: ene, ee log pi. 


i,j 
“En jD(\uh:?) (us? |||0) — H(p) 


= 3 DijEr.s(\u;j (ui) — H(p) 
ij 


=logd — H(p). 


Exercise 8.35 The optimal protocol satisfying —i log L(kKy) = r is given as the 
operation «,, given in (8.86) and (8.87) with r = r/ satisfying 


i= Tr(p®" _ en) {p3" _ enn > 0} 


nr 
e = P 
en 


(8.348) 


In this case, we have 
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£1 (Fens lu)(ul®") = Tr(p" — e-™*) { p" — em > Ol . (8.349) 
Since r, — r, (2.187) implies that 


1 
——logées(Kn, |) (u|®”) > mak —w(s| Trp |u)(u|) + sr 
n ss 


= max s(Aji45(py) — 1) 
s>0 


Exercise 8.36 Consider the case when L(k,) = e””. Then, we apply (8.84) at the 
Proof of Theorem 8.7 to the case with o = 5°; A;|uA ® u?) (uA @ uP |. We choose 
s = —t. Then, we have ¢(s) = $(—t) = tH_;(p,). Hence, we obtain (8.92). 


Exercise 8.37 


(a) Since «(c) is separable, (8.94) follows from (8.7). 
(b) Assume that |v) has the Schmidt decomposition 5°, //A;|u/)|u?) and that A; = 
\Y. When o = |u)(uA| @ |u®)(u2|, Tr |u) (ulo = A}. Hence, it is enough to show 
Tr |u)(ulo < a when o is a pure state |x)|y). 

This argument can be shown by 


[(ulx)ly)? =D Ne |x) (uP Ly) P 


2 


= 191 >) VriluP) (uA xy) < I  ilaP) WAU? = At, 


where |¥y) is the complex conjugate of |y). 

(c) (a) follows from the relation min,¢s, (Tr |g) (Dg|K(a))7! > 4 shown by (a). 
(b) follows from the definition of the dual map x«*. (c) follows from the fact that 
0 < K*(|®g)(®a|) < I. (e) follows from (b). (d) can be shown as follows. The 
maximum maxg<r<y-Tr |u) (ujT=1 (Ir To)! is realized when T = |u)(u|, which does 
not depend on a. So, we have 


hate ee To) '|Tr|u)(ulT = 1} 


min(Tr To)~! = min(Tr |u) (ulo)~!. 
~ oe T<r: Me (ul l oeS, oeSs 


(d) The separable TP-CP map achieving the rate — log x is given in Exercise 8.31. 
The inequality < in (8.93) can be shown the combination of (8.96) and the relation 


max {log L(K)| €2(k, p) = O} = max{log d| Tr K((u)(u|)| Pa) (Pal = 1}. 


Exercise 8.38 First, notice that any pure state on the subspace {u@u—u@v|u, v € C} 
has the form wu ® v2 — v2 ® v1) with orthogonal normalized vectors v; and v2. 
Thus, when p has a decomposition >", p;|x;)(x;|, we have H (Trg |x;) (x;|) = log 2, 
which implies that EF ¢(p) = log 2. 
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Exercise 8.39 When pis a pure state case, (8.115) follows from (8.64) in Theorem 8.4. 
In the mixed state case, we choose the decomposition p = >), pi|xi)(x;| such that 
E;(p) = >; pil Trg |x;) (x71). Then, 


>) iA Are |x) (xil) = 2 pi Ep (lai) (xi) 


>Ey (= pestta = E;(K(p)), 


which implies (8.115). 


Exercise 8.40 Due to (8.109), the RHS of (8.99) in Theorem 8.8 approaches 0 
exponentially when R > E(p) and L = [e"*]. Hence, we obtain E&-"(\u)(u|) > 
E(Trg |u) (uJ). 


Exercise 8.41 We make a decomposition p = >”, p;|x;)(x;| such that )°; p; H (Tre | 
Xi) (xi|) = Ef (p). Then, we choose a separable state o; such that H (Trg |x;)(x;|) = 
E,.5(\x;) (xi|) = D(|x;) (x;||lo;). The joint convexity of the relative entropy guaran- 
tees that 


E,s(p) < o(a a) = o(X piles) xalll >, ne) 


<>) piD lx) (xilllov) = 2 pi Are |xi)(xil) = Ey (p). 


Exercise 8.42 


(a) Based on Lemma 8.2, we choose purifications x, and y, with the reference system 
Hr.n such that D(|x,)(Xnl, |¥n) (Yul) = bCPn, On). Hence, due to Exercises 3.24 and 
3.25, we have 


Ai (|Xn) (nls 1Yn) (Ynl) > O (8.350) 


because d1 (fn, ,) — 0. Then, we choose a decomposition o,, = >; Pn.ilYn.i) (Yn.il 
such that 3°; Pai (Tre lyni) (Yn il) = E ¢(o,). Using Lemma 8.3, we find a POVM 
M" = {M7} on the reference system He, such that Trr(J @ M")|yn) (yal = 
Pnil¥ni)(Yn,i|- Then, we make the decomposition p, = >); Pia lXni) (Xn,i by 
Tra ® M?)\Xn) (Xn = Pi [Xn,i) (Xn,i |. Thus, we have 


Es (on) = >) Pri A Tre |yni) (Yn il) = Heisgn tee lyn) (ou) (ALR) (8.351) 


SD PH le |Xn,i) (Xn il) = Aegan Cree bsx)(69)) (ALR). (8.352) 


U 


Since the monotonicity for d, implies 
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d (im @ ca(Tre |Xn) (nl), km" @ ta(Trg | Yn) (yal) 
<1 (Xn) (Xnl, Lyn) Mal), 


(5.104) and (8.350) imply that 


1 
jordin, dim Ha | Heggn @ea (Teg lyn) (yn) (ALR) — Aayn ca (Tre Len) onl) (ALR) 
wn 


—> 0. 


Thus, (8.351) and (8.352) yield the desired argument. 

(b) From (a), we find that E ¢(p,) < Ef (on) + o(log dim H4,,) when d} (pn, On) > 
0. Replacing p, and o,, we have Es(p,) => Es (on) + oog dim H,_,,). Hence, we 
have E¢(pn) = Ef (on) + odog dim 7H, _,). 


Exercise 8.43 


(a) Choose an extension 0/48” of o, such that I,ase(A : B|E) = Exq (on). We choose 
a purification | y,) of o42” with the reference system 7H». Based on on Lemma 8.2, 
we choose a purification x, of p, with the reference system Hg ® HR, such that 
b(\Xn) (Xnls L¥n) Nnl) = b(n, On). Hence, we have (8.350), which implies that 


dy (pAB® ABE) _, 0, (8.353) 


where p42 = Trp |Xn)(xp|. Thus, (5.106) in Exercise 5.40 guarantees that 


—______|I ase (A: B|E) — I,ase(A : BIE 0, 8.354 
joedima, |E) — [gaze ( |E)| > (8.354) 


which implies the desired argument. 
(b) The desired argument can be shown by the same way as (b) of Exercise 8.42. 


Exercise 8.44 Let states p, and o, on the bipartite system 74, ® 7z,, satisfy 
llOn — Onl — 0. Based on (8.125), we choose M,, € C such that EC (on) — 
— Hay, (0,)(A| BE). Since the monotonicity of d; yields that d\ (Ay, (On), ha, (Pn)) < 
di (On, Pn), (5.104) implies that 


1 


—————_| Hz. A|BE) — Hg, (c, (A|BE 0. 8.355 
Togdim Hag! fimim(AIBE) — Hy, (o,)(AIBE)| > (8.355) 
Thus, ES (Pn) = foe (On) + o(log dim 11,4 ,,). Therefore, we can show Condition E3 
(continuity) by the same way as (b) of Exercise 8.42. 


Exercise 8.45 Since the > part is obvious, we show the < part. Assume that p and 
go are states on Hy, ® Hp, and H,4, ® Hz,, respectively. We choose an arbitrary 
extension (4,,A5,B,,B),E Of p ® o. Chain rule (5.109) implies that 
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vf 


PA, A,B) ,B, 


=I 


PA, ,Ay,B,.B, 


,(A1A2 : Bi Ba|E) 
p(AiA2: Bi| BoE) + Ip, 2 (A1A2 : Bo|E) 
=Tp4, 5.5.0.2 (At: BilA2B2E) + I, : Bi|BoE) 


a Loy, hg, 50 AD : Bo|A1E) + Ti iy (Ai : BE) 


(a) 
>I 


PA, ,Az,By,B9, 


Ay ,A7,B,,B, 


clog ity tc 


ei At: By|A2 BoE) + Tig: dpi coy ce AD B,|A\E), 


where (a) follows from the non-negativity of the conditional mutual information 
(See (5.90).). Thus, we obtain the < part of (8.128). 


Exercise 8.46 The difference from (8.144) is only the point that |uj') is not neces- 
sarily orthogonal. However, the proof of (8.144) does not require the orthogonality 
of ae ). So, (8.138) can be shown by the same way. 


Exercise 8.47 Choose a local operation «, in class C from (H,4)®”" ® (Hg)®" to 
C* @C* satisfying that “£% — E°,(p) and (8.117). Then, the monotonicity (E2C) 


. . Can ¥ Ce (pan . e 
and E3’ yield that lim,_. a) > limpsoo Ethno) > limpsoo de E 590). 


Exercise 8.48 Due to (8.84), when a separable operation «,, satisfies that L(K,) = e”” 
any separable state o satisfies that 


—Dy 45 (plla)+r 


(Den |) (p2")| Ben) <e is (8.356) 
for s > 0. Taking the minimum for a, we have 
; —E\ ass (p)tsr 
(Deo |K,(9®")[Bowr) Se", (8.357) 


which implies (8.139). Since the same discussion holds with D,, ,(p||o), we obtain 
(8.140). 
Exercise 8.49 We choose the purification |x) = >°; /piluf, u?, x®). Then, 
Tra(M ® Ip.r)|x)(x| = 2 JPi JD; (up |My lus) |uP, xf) (U2, x? |, 
Thus, 
Tra.e(Mjr ® Ip,n)|X) (x1) = >) Pi Dy (UA Mi lu) (xP |xf) uP) (w?, 
ij 


Tra,e(Mi ® Ip, e)ix)(x] = >) pilus |Mi lua) ixf) (xf. 


Therefore, 
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K(Tra,r(Mj ® Ig,r)|x)(x]) 
= D>) VPI Di (uA |My lu) (x? xP) (uF lu?) (uF uf) xf) xf | 


wij 


=> pilus |My |us)|x%)(xf| = Tra,p(Mi ® Ie,r)|x) (x1. 


i 


Exercise 8.50 Consider the case when L(k,) = e””. Then, we apply (8.84) at the 
proof of Theorem 8.7 to the case with 0 = o,. We choose s = —t. Then, we have 
0(s) = O(—t) = tD141(Palloa). Hence, we obtain (8.160). 


Exercise 8.51 Let p be a state on H4 ® Hg and ky be a TP-CP map on Hy for 
X = A, B. Choose the Stinespring representation Ux, with the environment 7/x, 
of &x for X = A, B. We choose a purification |x) of p with the reference system 
Ha, ® He, such that E,(p) = HA (Treg |x)(x|). Then, U4 ® Ug|x) is a purification 
of K4 @ Kp(p) and satisfies H(U, @ Ug|x)(x|U ® Uz) = A (Trp |x) (x|). Hence, 
E,(Ka ® Kp(p)) < Ep(p), which implies E20. 

Let & be an operation containing quantum communication with size d. Alice’s 
operation is given as instrument {rey with Alice’s resultant system 7{4,. Take 
its indirect measurement (Hz, U, |z)(z|, {E;}4 ,) given in Theorem 7.3. Then, con- 
sidering the Stinespring representation Ug; of Bob’s operation for i, we have a 
purification (7, E; ® Ugi)(U @ Ip|x) ® |v?) of «(p). Hence, (5.82) yields 


d 
H(Trpg K(p)) = H (™ S| EU (Trp |x) pues) 


i=1 
<H(U (Trg |x) (x|)U*) + logd = H((Trg |x)(x|)) + logd = Ep(p) + logd. 


Exercise 8.52 Let states p, and o, on the bipartite system H,, ® Hg,, satisfy 
di(Pn, On) — 0. We choose a purification |y,) of o, with the reference system 
Hasyn ® Hep,n such that E,(o,) = H (Trg |yn){yn|). Due to the discussion in the 
proof of Lemma 8.13, we can assume that 


dim Ha, < (dim H,4,, dim H,,,)°. (8.358) 


Based on Lemma 8.2, we choose a purification |x,) of p, such that b(|x,)(xnl, 
vn) (Ynl) = b(~n, On). Hence, due to Exercises 3.24 and 3.25, we have 


Qi (|Xn) (nls 1Yn) (Ynl) > O (8.359) 


because d1(~n, On) — O. Thus, (5.92) in Theorem 5.12, (8.358), and (8.359) imply 
that 
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1 
log max(dim H4_,, dim 7g») 
_ 1 
~ log max(dim H4 ,, dim Hen) 


|H (Trg |Xn)(Ynl) — Ep(on)| 


| (Trp |Xn) (nl) — A (Tre lyn) (ynl| > 0. 


Hence, combining the discussion in (b) of Exercise 8.42, we obtain the condition 
E3. 


Exercise 8.53 Due to (8.173), it is sufficient to show 
Es(p®) + Ey(o®*) = Es(p?® @ a? ), (8.360) 


Since p is separable, p?-* satisfies the condition for Exercise 8.46. Hence, we obtain 
(8.360) 


Exercise 8.54 For any bipartite pure state |) (u|, the entanglement formation E + 
between the system 7/g and the reference is zero. So, (8.173) guarantees that 
Ce” thy) (u|) = H(Trag |u)(u|), which implies Condition E1’. 

Consider a bipartite state p on 714 ® Hp, and TP-CP maps «4 and kg on H, and 
7H. Hence, there exists a POVM {M;} on 7, satisfying the following equation (a). 
Then, the following equation (b) follows from the definition of the dual map «% and 
the following inequality (c) follows from Exercise 5.46. 


Cr cg @ Kp(p)) 


LH (Ke(p")) — D2 Tr Mika (o*)H ( 


Tra(M; ® Ig) (Ka ® “2p) 
Tr Mjka(p4) 


kg (Tra(K4(M;) ® =?) ) 
Tr «4 (M;)p4 


PH (ka (0) — D2 Tre (M,) pH ( 


(c) 1 
<H(p*) — > «4 (M;)p“H Go Tra(K4(M;) ® In)p) 
i Aw! 


20," 


Exercise 8.55 The dimension of the reference system is less than dim 7, dim 7/3. 
H(p®) satisfies Condition E3. Thus, due to Relation (8.173), Condition E3 for 
C/> 8 (p) follows from Condition E3 for E ¢(p®"*). 


Exercise 8.56 For a POVM {MM} on H4, we choose another POVM {M;,;} on H4 
such thatrank Mj; = land >"; M;,; = M;.Since Tr4(M;@Ig)p) = ba Tra(M;,;® 
Tz) p), the concavity of von Neumann entropy yields 

H(p®) — 5° Tr Mjp*H ( 


1 
—— Tr,(M; ® I 
rags TAM @ »»)) 


1 
<H | p®)- PP: Mi.j0" Hapa Tr4(M;,; ® Ip)p) 
i,j Bd 
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Exercise 8.57 


(a) {M;} is given by a CONS {|u;)}. Hence, [,(A : B) = H,(B) — H, (BIA) = 
H (Tra |X))((X|) — 0; Te XM; X* A (sq XMiX") = H (Tr |X))((X1). 

(b) Let p be a separable state of the form (8.180) with rank pe = |. Then, we 
denote pp by |x? ) Ge |. We define the map X . > alba ae ) Ge, Then, the state 
> M; ® XM; X* is p. Due to (a), we have H(Tr, p) = 1,(A: B). 

(c) Now, we denote the original systems 7{4 and 7H, by Ha, and H,,. We 
choose a purification |x) of p with the reference system 714, ® 71, such that 


E,(p) = A(Tra ea). Using the CONS {lui')} on 714,, we define PVM 


Me {M;} with M,; & = uA a, Then, the pinching map «Ky given in (1.13) 


ati Tra, Ku (|x)(x|) = Tra, |x)(x|. Since Ky(\x)(x|) can be written as the 
form >); [ué) (uf | ® pj BiBa We have 


Tra, ay [x) (x| = DTA goes (8.361) 


Since rank Tr 4, 2, por = = |, the state 5°, Tra, p da is separable as a bipartite 
state on Hg, © Hz,. Thus, Theorem 8.3 guarantees that H(>°; Tra, a Ar, Ba) > >H 
(>), Tra,.a, 0/77") = A (Tra, p). Hence, (8.361) implies that E,(p) = H (Tra, p). 


Exercise 8.58 Using (8.173), we have 


D(B|A)» = H(p*) + H(p®) — H(p**) — (H(p) — E¢(?*)) 
=H (p*) — H(p4*®) + Ep(p?®) = H(p**) — H(p*) + Ey (p?*) 
=H,se(B|R) + E,/(p?’). 


Beers - 59 Given a state p on H, @ Hp, we choose a state p,gr with the form 
>, pres? ® |u®)(u®| satisfying the condition given in (8.200). We also choose 
operations &,4 and Kg on 7, and Hz, respectively. Then, we have Ii, ,@5)(pagn)(A : 
B\E) < I,,,,(A : B|E) < 6. We can check that the state (k4 ® &g)(pape) Sat- 
isfies other conditions in (8.200) for (k4 ® Kg)(pap). Since I,,,,(AB : E) = 
T,@ep)(pagr)(AB : E), we have the first inequality of (8.204). Similarly, we can 
show the second inequality of (8.204). 


Exercise 8.60 


(a) Chain rule for the conditional mutual information (5.109) implies that 


0 = I,sse(A, Az : By Bo|E) 
=T)ase(Ay : By Bo|E) + I,sve (Ay : By By| AE) 
=[ sve (Ay : By|E) + [sve (Ay : Bo| Bi E) 

+ [ave (Az : By|A,E) + Ipase (Az : Bo|A1 Bi E). 
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Since the conditional mutual information is non-negative, we obtain the desired 
argument. 

(b) We show only C(p4!8! @ p4282) > C(p4!8!) + C(p42%2) because the opposite 
inequality is oblivious. For this purpose, we show 


C(et™ @ p20) = Cp"™™, 0) + C(p*™, W. (8.362) 


As shown in (a), when an extension p48" of p4!4: @ p42” satisfies the condition in 
(8.201) with 5 = 0, the state Tr4,,, p48" satisfies the condition for p“':8' and the 
state p“¥ satisfies the condition for p“?'? by regarding the system Hs, @Hs, @He 
as the environment. 

Since Has: (Az Bz AB ) = H,yase (Az Bz) + H,ase (A; Bi), we have 


T ase (Az Bo ei E|A,B,) 
= Hyase(A2B2 A,B) + Hyase (EA, B;) 

— Ayase (E A2BzA,B,) — Hyase (A, By) 
=H yase (Az Bz) + Hyase (EAB) — Hyase (E Az B2A\ By) 
= pike (Ar Bo : EA,B)). 


Thus, chain rule for the mutual information (5.108) implies that 


T,ase (A, A2B, Bo : E) = Tyase (A, By : E) + T,ase(A2 Bo Z E|A,B)) 
=[ ase (A, By :E)+ Tase (Az By : EA,B}). 


Hence, we obtain (8.362). 
Exercise 8.61 


(a) Use Exercise 5.43. 

(b) It is sufficient to show C(p,0) < C (p, 0) because the opposite inequality is 
oblivious. Choose an extension p“8¥ of p satisfies I,ase(A : BJE) = 0. Then, (ky ® 
taB)(p*8) is an extension p“8 of p satisfying the conditions for C(p, 0). Exercise 
5.42 implies that [(¢je045)(p48#)(AB : E) < I,sse(AB : E). Hence, C(p,0) < 
E(p, 0). 


Exercise 8.62 Choose the unitary U as the unitary matrix transforming every base 
of the first basis to every base of the second basis. 


Exercise 8.63 7’ 0 oT is a CP map if and only if the following holds for any integer 
n: The inequality Tr or’ 0&0 T(p) => O holds for any states p and o on H ® C” and 
H' ® C". Let r” denote the transpose on C”. Then, since rT” commutes «, 


Tr(7’ @T")(a)T" oK oT" ((T @T")(p)) = Tr(7’ @ T")(a)K(T @ 7”)(p)). 


Since (7 ® r")(p) and (7’ ® r")(c) are states on on H @ C” and H’ ® C", we obtain 
the desired equivalence. 
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The second argument can be shown by (A.18) as follows. 
|X|], = Try unitary TrUxX = Try unitary Tr T(U)T(X) = ||T(X)|I1. 


Exercise 8.64 Due to Exercise 8.63, 7“ 0074 is TP-CP if and only if (74 @7?’)o 
TB oko0T® 0 (74 @T®) is TP-CP. Since (74 @T? )oT® oK0TF 0 (74 QT?) = 
T® oko 7%, we obtain the desired equivalence. 


Exercise 8.65 Due to Exercise 8.64, the completely positivity of r4’ o Ko r4 is 


equivalent with that of T? o & oT. Since this equivalence does not depend on the 
choice of the bases on H, and 4’, we obtain the desired equivalence. 


Exercise 8.66 The second argument of Exercise 8.63 implies that 


I74(ola = Ir4 @ 74)74 (PY) = IIT? 


Since the above equation holds for any basis on (4, we obtain the desired equation. 


Exercise 8.67 Equation (8.239) can be shown by the same way as Exercise 8.50 by 
replacing the role of (8.84) at the proof of Theorem 8.7 by that of (8.233) at the 
proof of (8.223). 


Exercise 8.68 


(a) 
Trop =Trr4(a)r4(p) < Ir4(@INI74 (Pll. 


(b) Since ||74(«(c)) ||) = Tr 74(K(c)) = 1, using (8.241) and the second equation 
of (8.219), we have 


1 
max Tr|®4)(Pala(o) = pas 74a) (Pa Illi“) < Zz 


(c) Since T4(c) > O and 74(\u) (ul) < |74(\u) (u|)|, we have 


max Tr |x)(x|o = max Tr 74 (|x) (x|)74(o) 
o€ Sppr 


< max eee 
o€ Sper 


@2 
(a) A 1 
= max Tr Aj|uj) (uj =N, 
max (= Vi lui) ) To) = dj 
where (a) follows from (8.217). The equality follows from the choice of o given in 
(b) of Exercise 8.37. 

(d) We have 
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max {log L()| €2(k, p) = 0} = max flog d| Tr K(|x) (x) 1a) (Pal = 1 


max { min (Tr|g)(ale(o)) "| Te s(x) (x) 1a) (Pal = 1 


KESppp o€Sppr 


(b) . = = 
< min (Tr|x)(xJo)! 2 AY, 
o€Sppr 


where (a), (b) and (c) follow from (8.242), the same discussion as (8.96), and (8.243), 
respectively. The PPT operation achieving the rate — log x is given in Exercise 8.31. 


Exercise 8.69 


(a) Since Trfe"4 +9 < pO" < e~MH()—9} 98" _, 1, we have 


{eM +9 < Br < e-nH)-9} ) 


Qn _ = @n 
lee" = Pall: = Trp (1 Tile MAOHO < 98" < e-O=O} 58" 


=Tr por _ {eA o+o < po < e MA(P)—9 1) 


1 
- (1 Tr{eA+o < pan < ae 


{eH +9) < 8" < saeeo)) 
—>0. 


(b) All of eigenvalues of Tr{e""\4#+9 < p®" < e~"4(P)—9} 58" 5, belong to the 
interval [e~"\4 +9, e-"4()—9]. So, for sufficiently large n, all of eigenvalues of 
Pn belong to the interval [e~"\4) +9, e~"(4)-29]_ Hence, Exercise 2.27 guarantees 
the desired inequality. 

(c) Choose purifications x,, yy of p®”, py, such that F(|Xn)(Xnl, Yn) (val) = F(p®", 
Pn). Hence, (a) and (3.52) guarantee that dj (|Xn)(Xn|, |Yn)(yn|) — 0. Thus, we find 
that the purifications x,, y, of p®", p, give a counterexample of the continuity of 
2 log ||74 (x) (x|)|]1. However, (8.218) and (a) imply that 


=) A 
— log |“ (lan) %nl)Ih = Hy (0) 


—2 1 
=| 8 Ir“ yn) (Ya DU = i (On) S H(p) +e. 


: : C B\ fA+C 0 AB 
Exercise 8.70 Since _BtA = ( 0 anc) ~ (gee) =o we have 


(:8)-(2 SDE 


Exercise 8.71 Under the correspondence (8.229), we have 
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Dy45(pllo) + log ||74 (0) |) = Diss (pllo’). 


Hence, similar to (8.231), we have (8.246). Similarly, we can show (8.247). 
Exercise 8.72 To show the inequality (8.249), we consider the state 4p,|0, 0)(0, O|+ 


(1 — A)pall, 1)(1, 1]. Choose oj := argmin,.9.)-4(¢y),=1D1+s(pllo’). Applying the 
monotonicity (a) of Exercise 5.25 to the partial trace, we have 


eo Fi4s| sppApit+(—A)p2) < eo PitsOpi +(1—A)p2||Ao, +U—A)o5) 


<5 Pits (Ap1810,0) (0,0|+ A) p2@11,1) (1, HIAo} 10,0) (0,014 —A)o3@ 11,1) (1,1) 


=e Pres(eilloy) 4 a= Ajo Pi+s(P2llo2) 


— \e5 E1+sispp(p1) +(1— Ayes Eitsispp (2) 


Other inequalities can be shown in the same way. 


Exercise 8.73 Due to (8.233), when a separable operation «,, satisfies that L(K,) = 
e””, any state o satisfies that 


=D1 45 (plla)—log lir4 (Il, +r 
ps Ts 


(Dear |i (p®")| Ben) < eo" (8.363) 
for s > 0. Taking the minimum for a, we have 
4 = —E1+5| spp (p)+sr 
(Don |) (p®")|Pew) Se", (8.364) 


which implies (8.252). Since the same discussion holds with D,,,(pllo), we obtain 
(8.253). 


Exercise 8.74 


(a) When 6 = d (x, y), Exercise 3.18 guarantees that 


lIlx)l —lyMyllh _ 2V1—lly)P? _ 2V1—cos?@ _ 2sind @ 


d(x, y) d(x, y) 7 0 ey Os 


where (a) follows from ane <i. 
(b) (8.278) and triangle inequality imply that 


[| Tre |x) (x] — pmix,all3 — Il Teor Ly)(yl — pmix,all3| 
d(x, y) 
_ Tra(Trg [x)(x|)? — 2 Tra (Trg |x) (x1) pmix,a 
7 d(x, y) 
4s —Tra(Trg |y)(y|)? + 2 Tra (Trg |y) (yl) Pmix,A 
d(x, y) 
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_ Tale |x)(x|) — (Ire |y) (yD Cre |x) (x) + (Tre ly) (yl) — 2p mix, a] 
d(x, y) 
_ | Tre be) x] = Tra ly) (yi Trae be) x) + te Ly) = 2Pmix,all 
d(x, y) 


gollanial — ly)(yilh Oy 
d(x, y) 


where (a) follows from (a). 


Exercise 8.75 


(a) Consider the matrix A = C 2 


cd 


T _ fac 0 -i ab 
@ a= (55)() i) 
a 0 —i(ad —be)\ _ 
(rer 0 ) = spacta. 


) Then, we have 


(b) We have 


(A @ B)p(A ® B)*(S2 ® S2)(A @ B)p(A ® B)*(Sz ® Sz) 
=(A @ B)p(A™ ® B")(S2 ® S2)(A @ B)p(A™ @ B7)(S; ® S) 
=(det A) (det B)(A ® B)p(S2 ® S2)p(A™ ® B")(S2 ® S2)(A @ BA ® By! 
=(det A) (det B)(det A)(det B)(A ® B)p(S2 ® S2)(S2 ® S2)(A ® B)! 
=| det A|?| det B/?(A ® B)p(S2 ® S2)p(S2 ® S)(A @ B)!. 


(c) The eigenvalues of (A @ B)p(S2 ® S2)p(S2 @ S2)(A ® B)~! is the same 
as those of p(S2 ® S2)p(S2 ® Sz). Hence, due to (b), the eigenvalues of (A ® 
B)p(A ® B)*(S2 ® S2)(A ®@ B)p(A ® B)*(S2 ® Sz) is the same as those of ($2. @ 
S2)(| det A|| det B| 0) (S2®S2). Thus, C,((A® B)p(A®B)*) = C,(| det A|| det Bl p). 
Since C,(cp) = cC,(p) for any constant c > 0, we obtain the desired argument. 
(d) Substituting A,, and B,, into A and B, we obtain the desired argument. 


Exercise 8.76 


(a) The definitions of \eA8 ) and |} ) in (1.20) and Exercise 8.34 imply that 


ABy/ ,AB A,By; A,B ABy, AB A.By, A,B 
leo ) (eg [= |Uo.9 )(Uo.0 I, le; )(e} |= |ui 9 (uy 6 | 
ABy/ ,AB A,By; A,B ABy, AB A.By, A,B 
|e5 )(e5 = |} 1 (uy |; |e3 ) (3 [= |Uo )(Ug.1 |. 
Since all of entries of |i") are real numbers, |e4)(e48| = |eA3)(e43|, which 


implies that Ppet,p = PBell,p- 
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(b) It is sufficient to show ($2 ® S>)|e*?) = let), Due to the definition (1.20) of 
le ), the relation follows from the relation SS; st = §;. 
(c) The statement (b) implies that 


PBell, p(S2 ® Sz) pet, p(S2 @ S2) = PBell,p* 


So, the eigenvalues of pgen, p(S2 ® $2) Ppen, p(S2 @ Sz) are ee Di oe and a That 
is, the square roots are po, P|, P2, and p3. Hence, C,(ppeti,p) is (2 max; p;) — |. 


Exercise 8.77 
(a) Itis sufficient to show that Tr(A @ B) |e?) (eA? |(A@ B)* — $Tr AY AS? BT BS’. 
We have 
Tr(A @ B)le#?)(eA7|(A @ B)* = (e?|(A @ B)*(A @ B) le”) 
=(e6?|(S; @ 1)*(A @ B)*(A @ B)(S; @ I)|e5”) 
=(e)?|(AS; ®@ B)*(AS; ® B)\e5*) = (ef? |(AS;B’ @ 1)*(AS;B" @ Die”) 


1 1 2 
=5 Tr(AS, B’)* AS; B’ = 5 Tr A* AS; B’ BS;. 
(b) For any positive semidefinite matrix C, we have +Trc > det C. Hence, 
5 Tr A*AS;B7 BS; = + Tr AS; B7 BS? A* > \/det AS;B? BST A* = | det A|| det B]. 


(c) The statements (a) and (b) imply that aoe “asa S 1. Hence, (8.318) 
yields (8.320). 


Exercise 8.78 From the definition (8.141), any maximally correlated state p can be 
written as 


P =a|Up, Uo) (Uo, Uo| + t\Uo, Uo) (U1, U1 | 


+ t|uj, U1) (uo, Uol + CL — a) uy, 1) (ud, U1 


with a > Oandt € C. Assume that t = be’ with b > 0. We choose the new basis 
|0) := |uo) and |1) = e~|u1). Then, p = pa.p. 


Exercise 8.79 
(a) Since (S; ® Sp)|00) = —|11) and (Sy @ Sp)|11) = —|00), we have (Sp @ 
Sx) Pa,b(S2 ® Sr) = (S2 ® Sr) pPa,n(S2 ® Sx) = Pi—ap- 


: a b l-ab\ | a(l—a)+bh 2ab 
(b) Since pian b 2) = ( 2(1—a)b a(l—a)+b? 


ment (a) implies that 


) , the state- 
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Pa,b(S2 ® S2)pa,b(S2 ® S2) = Pa,bPi—a,b 
=(a(1 — a) + b?)|00) (00| + 2ab]00) (11| 

+ 2(1 — a)b|11)(00| + (a1 — a) +. b?)|11) (11. 
a(l—a)+b? 2ab 


2(1 —a)b see) are a(l — a) + b* + 
2b./a(l — a) = (Jal — a) £ b)* because /a(1 — a) > b. Thus, 


Co(Pap) = Vall — a) + b— (Val — a) — b) = 2b. 
It,/(2a—1)?+4b2 


Exercise 8.80 The eigenvalues of p,,, are ——.—. Hence, we have H (pq,5) = 


2 
] 1+.4/ (2a-1 244p2 
( ) 


(c) The eigenvalues of ( 


Exercise 8.81 The relation (5.12) implies that 


(Ka,. ® tr)(|Pa) (Pal) = Al Pa) (Pal + A — A)pmix,a @ Tra |@a)(Pal 
=| Bz) (Pal +d — A) Pmix,A ® Pmix,R = Py 1=Na?=1) « 
aa 


Similarly, (5.18) and the first equation in (8.219) imply that 


(Kg, @ tr) ((®a) (Pal) = ATA (l@a) (Pal) + = A)pPmix,a @ Pmix,z 
A 1-A 


=a + ae = Py, da@sd-n . 


Exercise 8.82 
(a) The first equation in (8.219) implies that 


3 
qi +rr4(|®q)(®al) = qi + qh = Pwo: 


(b) We have 


TA (pwip) = ql +174 (74 (1a) (Pal) = Gl +11 a) (Pa 
=q(I — |®a)(®al) + (¢ +1r)| Pa) (Pal. 


(c) Since q > 0, the statement (b) implies that |r“(pw,»)| = UW — |®a)(®al) + 
lq + r||®a)(Pa|. Thus, 


TA(\74 (pw. p))) = @U — 14 (1a) (Pal) + lg £174 (1a) (Pal) 


ate lqtri-g, 


1 
=g(I——F)+ =gI+ 
q( 7 ) q 7 
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Since HH! < Gt) _ = < g, we have r4(|r4(pw,p)|) = qi + EF = 0. 


Exercise 8.83 


(a) Equation (8.329) and the first equation in (8.219) imply that 


1—p d’p—1 a dp—1 
Aor») = I TA (|@y)(® I F 
T” (P1,p) Pol + @ol rid (Pa) (Pal) = BG Geo 

l-—p dp-—1 

=—>.—_U+ F ——F. 

p—1' 7 ean 


(b) Since 4 4 7 >0 and 4 > 0, we have 


= 


ee St ja aes OP 
@—1 dd@—l) | @— d(d—1) 
p= 1 


Hence, combining (a), we obtain the desired argument. 

©) The statement (b) and the first equation in (8.219) imply that 74 (|74 (7, pd= 
Fat’ (F))+ gal = gy +l ©a) (Pal) + Gea I The second inequality 
is trivial. 


Exercise 8.84 The cet ey non in (8.219) implies that (1 — A) Pmix +ATA (| ®a) (Ba) 
= AT + AF. AT + AF > 0 if and only if — SS 41, which is equivalent to 


1 _ a 
a7 = AS d+" 
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Chapter 9 
Analysis of Quantum Communication 
Protocols 


Abstract The problems of transmitting a classical message via a quantum channel 
(Chap. 4) and estimating a quantum state (Chaps. 3 and 6) have a classical analog. 
These are not intrinsically quantum-specific problems but quantum extensions of 
classical problems. The difficulties of these quantum extensions are mainly caused 
by the non-commutativity of quantum mechanics. However, quantum information 
processing is not merely a non-commuting version of classical information process- 
ing. There exist many quantum protocols without any classical analog. In this context, 
quantum information theory covers a greater field than a noncommutative analog of 
classical information theory. The key to these additional effects is the advantage of 
using entanglement treated in Chap. 8, where we examined mainly the quantification 
of entanglement. In this chapter, we will introduce several quantum communication 
protocols that are possible only by using entanglement and are therefore classically 
impossible. (Some of protocols introduced in this section have classical analogs.) We 
also examine the transmission of quantum states (quantum error correction), commu- 
nication in the presence of eavesdroppers, and several other types of communication 
that we could not handle in Chap. 4. As seen in this chapter, the transmission of a 
quantum state is closely related to communication with no information leakage to 
eavesdroppers. The noise in the transmission of a quantum state clearly corresponds 
to the eavesdropper in a quantum communication. 


9.1 Quantum Teleportation 


The curious properties of entangled states were first examined by Einstein et al. [1] in 
an attempt to show that quantum mechanics was incomplete. Recently, the entangled 
states have been treated in a manner rather different than when it was first introduced. 
For example, by regarding these states as the source of a quantum advantage, Bennett 
et al. [2] proposed quantum teleportation. Since this topic can be understood without 
any complicated mathematics, we introduce it in this section. 

In quantum teleportation, an entangled state is first shared between two parties. 
Then, by sending a classical message from one party to the other, it is possible to 
transmit a quantum state without directly sending it. Let us look at this protocol in 
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more detail. First, we prepare an entangled state ee — Fa (ug @ug +ut @uP) 
on the composite system 714 ® 71g composed of two qubits 714 and 7/g spanned by 
uj, u4 and u®, u®, respectively. The sender possesses a qubit Hc spanned by u6, ue 
as well as the quantum system 7/4. The sender sends qubit 7/¢ to the receiver. The 


receiver possesses the other qubit 71g. Then, we have the following theorem. 

Theorem 9.1 (BBCJPW [2]) Let the sender perform a measurement corresponding 
to the CONS id = (14 ® sje, = 0, 1, 2, 3) on the composite system Hs ® 
Hc and its result be sent to the receiver. [From (1.21), it satisfies the conditions for 


a PVM. ] Let the receiver perform a unitary time evolution corresponding to S? on 
the quantum system Hx. Then, the final state on Hp is the same state as the initial 
state on He. 


This argument holds irrespective of the initial state on 7c and the measurement 
outcome i, as proved below. 


Proof Let us first consider the case where the eee outcome is 0. Let the 
initial state on 7c be the Pure state x = Di x! veg Pgs oe state on the composite 
system 714 @ He ® Hc is B Dus KX igh Fy @ iG ® ue . Therefore, the final state 


on Hep is 0, Dai j x igik F Bo Jug = >, 4x*uB, following Exercise 7.4. Normal- 
izing this vector, we can a. that the final state on 71g equals ype x*uB, which is 
the same state as the initial state on Hc. 

Now, consider the case in which the measurement outcome 1 is obtained. Since 
Cras, 


Tra,c (le Ve *| ® In) Ix @ ef?) (x @ e5?| 


=Tra,c(SF ® Ia.a) (led )(e3""1 In) (SE @ La. )la @ ef?) (x @ €5”*| 


ms = {eee 
= Trac (ep) (e9'1 @ In) (Six) @ (Six) ® ef] = Z1S.x) Six. 


Operating S; on He (S; is its own inverse), the final state on Hp is x. a 


It is noteworthy that this protocol has been experimentally demonstrated [3-5]. 
Other protocols that combine quantum teleportation with cloning have also been 
proposed [6]. 


Exercise 
9.1 Show that quantum teleportation in any dimension d is possible by following the 
steps below. Let #4, Hg, Hc be the spaces spanned by a we Ke Sarna lla 


def 
sis respectively. Prepare an entangled state vr BS a >. _, uA @u? inHa, ® 


def 
7H ,. Now perform a measurement corresponding to {uj ce (14 @ X! UL Nias r his 


and then an operation m7 depending on the measurement outcome (i, j). Show 
that the final state on #/g is the same as the initial state on 7c. (For the definitions 
of X and Z, see Example 5.8.) 
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Fig. 9.1 C-Q channel original encoder decoder recovered 
coding with entangled inputs message message 


9.2 C-Q Channel Coding with Entangled Inputs 


In this section, we treat classical message transmission via a quantum channel «& from 
Ha to Hg. When we use only tensor product states in S(H®"), the problem becomes 
that of classical-quantum (c-q) channel coding discussed in Chap. 4 by setting V to 


S(H,) and W, = k(p). However, when we are allowed to use any state in S (HE") 
as the input state, our problem cannot be regarded as a special case of Chap. 4. In 
this case, the optimal rate of the sending classical information is called the classical 
capacity, which is given by 


C.(K) = max I(p,k). (9.1) 
pEeP(S(Ha)) 


When we are allowed to send any state entangled between n systems of the input, the 
classical capacity is C.(«®")/n. When any entangled state is available as an input 
state as Fig. 9.1, the code 6” = (N,, 6”, Y) is expressed by the triplet of the 
size N,, the encoder 6”) mapping from {1,..., N,} to S(H&"), and the POVM 


YV= Wega ale taking values in {1,..., N,} on the output space H?”. The error 
probability is given by 
1 Nn 
e[”] def * (1 _—Tr ¥;«"(G @)) ; 


i=1 


Then, we can show the following theorem by using Theorem 4.1 and the discussion 
in the proof of Theorem 4.2°*”*. 


Theorem 9.2 Define the entanglement-assisted classical capacity Cé(«)!: 


e __ 1 A 
Ce(K) = sup {im “tog 
n 


{O™} 


lim <[6”] =0 (9.2) 
n—->Co 


for a quantum channel k from H, to Hg. Then, we have 


'The superscript e of Cé indicates that “entangled” input is allowed. 
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Ci(K) = sup 


C.(K®") x Coe) 
— = lim ——. 
n n 


n>co On 
Since the inequality 
C.(K°") > nC.(k) (9.3) 
holds, the inequality 
Co(K) = Cel) (9.4) 


holds. For a past period, many people [7—12] conjectured the additivity of the clas- 
sical capacity for two arbitrary TP-CP maps «! and x7: 


Ce(K!) + Ce(n?) = Co(K! @ °), (9.5) 


which implies the equality in (9.3). Here, remember the relation (8.173). This relation 
indicates the relation between the classical capacity and the entanglement formation. 
Matsumoto et al. [13], Shor [14], and Pomeransky [15] showed the equivalence of 
the additivity of the classical capacity (9.5) and the additivity of the entanglement 
formation. However, as stated in Sect.8.13, the additivity of the entanglement for- 
mation does not hold in general. Hence, the additivity of the classical capacity (9.5) 
also does not hold in general. Also, the equality in (9.3) does not hold in general 
[16]. 

Here, we see the equivalence of the additivities of the classical capacity and the 
entanglement formation in the detail. For this purpose, we prepare some notations. 
The classical capacity C,.(«) is described by 


Ce(K) = mae Xn (P), 


where Holevo information y,,(¢) and minimum average output entropy H,,(/) 
are defined by 


Xn(p) = H(«(p)) — Hx (p), (9.6) 
ia Wei ; a 
(0) ree Px ((px)) (9.7) 


When « is the partial trace from the system 714 ® He to 71,, the relation (MSW 
correspondence [13]) 


A,,(p) = Ef(p) (9.8) 
holds, i.e., 


Xtr,(p) = H(Tra p) — E¢(p). (9.9) 
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Now, we state the equivalence relations for several kinds of additivity conditions. 


Theorem 9.3 (Matsumoto et al. [13], Shor [14], Pomeransky [15]) The following 
14 conditions are equivalent. 


HM = Additivity of classical capacity of q-q channel (additivity of maximum Holevo 
information): 


max Xi (p') + max X,2(7) = Max Xx«!@%2(0"”) (9.10) 
p pr pi? 
holds for arbitrary channels «! and k°. 
HA = Additivity of Holevo information: 
Xt (0!) + Xn2 (97) = Xni@n2(p' @ p) (9.11) 


holds for arbitrary channels «! and K* and arbitrary states p! and p”. 
HL = Additivity of classical capacity of q-q channel with linear cost constraint 
(additivity of maximum Holevo information with linear cost constraint): 


max Cyr<yx(k1) + Cyr<a—yx (i) = Cxtyxick(s! ® K°), (9.12) 
Lé., 
max max Xe (p') + max Xre2 (0°) 
A pl:Tr pl X1<\K p>:Tr p?X?<(1-A)K 


1,2 
ae max Xnta@n2(p”) (9.13) 
p'2:Tr ph2(X!4X2)<K 


holds for arbitrary channels «| and «?, arbitrary Hermitian matrices X' and X* 
on the respective input system, and an arbitrary constant K. Here we identify X' 
(X7) with X! @ I? (I' @ X?). Note that the classical capacity of a q-q channel 


with linear cost constraint has the form Cx <x (K) = MaXp-tr px<K X(P)- 
HC = Additivity of conjugate Holevo information: 


x01 (X!) + x82 (X?) = xh g,2(X! + X?) (9.14) 


holds for Hermitian matrices X! and X? on systems H, and Hp, where conjugate 
Holevo information x*(X) is defined as the Legendre transform of X;,(p) as 


x(x) = max Tr Xp + Xx (0). 
HS = Subadditivity of Holevo information: 


XO) + Xn2(07) < Xnten2 (0) (9.15) 


holds for arbitrary channels «| and «> and arbitrary states p'*, where p! = 
Tr p!? and p* = Tr p!. 
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EM = Additivity of minimum output entropy: 


min H(«!(p')) + min H(«K?(p?)) = min H(«! @ K?(p!)), (9.16) 
p! pe pi2 


Lé., 
min H,, (p!) + min H,,2 (p°) = min Hyie,2 (p!?) (9.17) 
p p ph 
holds for arbitrary channels «| and x. Note that the minimum output entropy 


has the form min, H(«(p)) = min, H,(p). 
EA = Additivity of minimum average output entropy: 


Aya (p') + Aya (p") = Axianr(p! ® p’) (9.18) 


holds for arbitrary channels «| and Kk? and arbitrary states p! and p*. 
EL = Additivity of minimum average output entropy with linear cost constraint: 


min min A,i(p') + min H,2(p") 
Xd pl:Tr p!X'<\K p?:Tr p?X2<(1—-A)K 
: 1,2 

= min Aian2(p” 9.19 
pl2sTr pl2(XI4-X2)<K K @K (p ) ( ) 
holds for arbitrary channels «| and «?, arbitrary Hermitian matrices X' and X* 

on the respective input system, and an arbitrary constant K. 

EC = Additivity of conjugate minimum average output entropy: 


H¥(X!) + H4(X?) = Hig a(X! + X?) (9.20) 


holds for Hermitian matrices X'! and X? on systems H, and H2, where the conju- 
gate minimum average output entropy H;: (X) is defined as the Legendre transform 
of Hy(p) as 


H*(X) & max Tr Xp — A,.(p). 
p 
ES = Superadditivity of minimum average output entropy: 
Hy (p!) + Hye (p?) = Axten2(0") (9.21) 


holds for arbitrary channels «', K? and arbitrary states p'. 
FA = Additivity of entanglement of formation: 


Es(p') + Epp?) = Ey(p' @ p’) (9.22) 


holds for a state p'! on H, = H,, ® Hp, anda state p* on Hy = Ha, ® Hp,. 
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FL = Additivity of minimum entanglement of formation with linear cost constraint: 
2 

Ey (po) 


= min Es(p'*) (9.23) 


p'?:Tr p):2(X!4.X2)<K 


min. min Ey(p')+ 
p 


min 
Xd pl:Tr p!X'<\K 2 Tr p?X?2<(1-A)K 


holds for arbitrary Hermitian matrices X' and X* and an arbitrary constant K. 
FC = Additivity of conjugate entanglement of formation: 


E*(X!) + Ey(X?) = Ev(X! + X’) (9.24) 


holds for Hermitian matrices X' and X? on systems Hy and H2, where the con- 
jugate entanglement of formation E t (X) is defined as the Legendre transform of 
E (p) as 


def 
E*(X) = max Tr Xp — E;(p). 


FS = Superadditivity of entanglement of formation: 
E;(p!) + Es(p”) < Er(p'”) (9.25) 


holds for a state p':? on (Ha, ® Ha,) @ (Hp, ® Ha,)- 


However, as shown in Sect. 8.13, FS does not hold. Hence, all of the above condi- 
tions are invalid. However, the papers [8, 17] numerically verified HM in the qubit 
case. HM has been shown in the following cases. 

(a) When C,.(«) is equal to the dimension of the output system, additivity (9.3) holds 
(trivial case). 

(b) Any entanglement-breaking channel «; (Example 5.4) satisfies additivity (9.5) 
with an arbitrary channel «2 [12]. 

(c) Any depolarizing channel xz, (Example 5.3) satisfies additivity (9.5) with «! = 
&a,\ and an arbitrary channel kK? [11]. 

(d) Any unital qubit channel x! satisfies additivity (9.5) with an arbitrary channel x? 
[10]. 


(e) Any antisymmetric channel KI , (Werner—Holevo channels, Example 5.9) 
| 


satisfies additivity (9.3) with K = a [18]. 

(f) All transpose depolarizing channels KA. , and Khe , satisfy additivity (9.5) with 
— Ka and Kk? = Ras [19, 20]. 

(g1) Channels Ka,, 0 K5y satisfy additivity (9.5) with «! = Ka,, 0 Ky and an arbi- 
trary channel x? [9, 21]. 


(g2) Channels Ks ; On and Ko KI _ Satisfy additivity (9.3) with «= 
2 d=. a g—1 
Ky fo) ee or ae fo) Ky [9, 22]. 
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(g3) Channels ke 4,0 ae and ne ) Kh , Satisfy additivity (9.5) with K) = he 0 ae 
or Ky OK) , and Kp = Ko Ky OF Ky O Ky, y [9, 22]. 

Therefore, we obtain C¢(K) = C,(«) in the cases (a), (b), (c), (d), (e), (g1), and 
(g2). Indeed, since C.(Ka,, 0 KP) = Ce(Ka,y), (€) yields that C.(Kg,, 0 KEP @ K2) < 
Ce(Ka,, ® K2) 

= C.(Ka,y) + Ce(K2) = Ce (Ka, © KEP) + C.(K2), which implies (g1). Similarly, we 
can show (g2) and (g3) from (e) and (f). 

Moreover, the additivity of minimum output entropy EM holds not only in the 
above cases but also in the more extended cases of (c), (e), and (f) as opposed to 
(g1), (g2), and (g3) [9, 22]. Since the condition EM is simple, it has been mainly 
discussed for verifying these conjectures. 

Before proceeding to the proof of the equivalence, we give a counter example 
of EM by modifying the discussion in Sect.8.13 [23]. First, for a given [cn]- 
dimensional subspace KC, we choose the isometry V and the spaces Hc¢,; and Hc.,2 as 
in the proof of Lemma 8.16. We also use the notations H4.1, H4.2, 12,1, and Hz.2 
given in Sect. 8.13. We define the TP-CP map «; from the system He to 74,; for 
i=1,2as 


K1(p) = Trg.iVpV*, 2(p) = Tra2VpV’. (9.26) 


Then, Lemma 8.16 implies that 
Cc Cc 
sd A((K1 @ K2)(p)) < 20 — D logk + hG). (9.27) 


Next, for given e, e’ > Oandc > 0, we choose a sufficiently large n. Then, we choose 
a [cn]-dimensional subspace K given in Theorem 8.14 such that 


2 
(—4clog 4) +2,/—2clog $1 —2clog$\ 
l-€ k 


min H (K;(p)) > logk 
p 


fori = 1, 2. Hence, with a sufficiently large k, the relation (8.258) guarantees that 
a H(K1(p)) + a A (k2(p)) > _ A ((K1 @ k2)(p)), (9.28) 


which contradicts EM. 

Now, we start to show the equivalence for these conditions. Among the above 
conditions, the relations HC=HM and EC>EM are trivial. From MSW correspon- 
dence (9.8) we obtain HA=>FA and EA=> FA. Next, we focus on the Stinespring 
representation (Hc, po, U,.) of k mapping from a system 7/4 to another system 71. 
In this case, the MSW correspondence (9.8) can be generalized as 


H,(p)= min n> Dil (Tta,c Ux (pi ® po)U;) = Ey (Rp), 


(Pipi)! DU; Pi pi= ; 


= def * 
K(p) = U,.(p ® po)U;,, 
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1.€., 
Xn(p) = H(K(p)) — Ef(K(p)), (9.29) 


where we use the notation E’ as the entanglement of formation between the output 
system Hg and the environment H4 ® Hc. Hence, if Condition FS holds, for ol, 
we have 


Xsten2(p'*) = H(K! @ K7(p'7)) — Ep (is! @ 2(p")) 
<H(k!(p')) + H(K(p")) — E(x! ® K2(p'7)) 
<H(k'(p')) + H(K*(p")) — (Ep (i! (p!)) + Ep (K2(p"))) 
=X x1 (9") + Xn2(P7). 


Hence, we have FS=}HS. Similarly, the relation FS=MS holds. 
The following lemma is useful for proofs of the remaining relations. 


Lemma 9.1 Let f' be a convex function defined on S(H;) (i = 1,2) and f!? bea 
convex function defined on S(H; ® H2) satisfying 


fi@)+ #?(°) = f'?(p' @ p’). (9.30) 


The relations L@C@S=>A hold among the following conditions. 
S  Superadditivity: 


PIP CYS rk Oo) (9.31) 


holds for a state p'* on (Ha, ® Ha,) ® (Hp, ® Hz,)- 
C_ Additivity of conjugate function: 


FER) + FPR) = FIO + X*) (9.32) 


holds for Hermitian matrices X' and X? on the systems H, and H2, where con- 
jugate entanglement of formation f*(X) is defined as the Legendre transform of 


f(p) as 
* def 
I(YO = mae Tt Xp— f(p). 
L_ = Additivity of minimum value with linear cost constraint: 
f7(0") 
= min f 7 (p') (9.33) 


p'2:Tr pl:2(X!4X2)<K 


min min PO y+ min 
A ph:Tr pl X'<\K p?:Tr p?X2<(1-A)K 
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Fig. 9.2. —: Lemma 9.1, HL FL EL 
--+: easy, —»: MSW t { t 
correspondence, =>: hard es aay 
P HM *; HC FC EC — EM 
i I { 
HS « FS — ES 
1 Ltt 4 


HA + FA « EA 


holds for arbitrary Hermitian matrices X' and X* and an arbitrary constant K. 
A Additivity: 


fet fe) = f'7(p! @ p’) (9.34) 


holds for a state p' on H, & ! Ha, ® Hp, and a state p> on H, & S Ha, @ Hz,. 


Lemma 9.1 yields the relations HLSHCSHS=>HA, ELSEC<SES=> EA, and 
FLSFCSFS>FA. 

Hence, if we prove the relations HM=>HC, EM=> EC, and FA=> FS, we obtain 
the equivalence among the above 14 conditions, as explained in Fig. 9.2. These proofs 
will be given in Sect. 9.8. The relations are summarized as follows. 

Finally, we prove the additivity for the classical capacity for entanglement- 
breaking channels by using inequality (5.110). From the definition, any entanglement- 
breaking channel «! has the form of the output state for any input state p, as 


(K! ONG.) = DG Pe OO, 5 


which implies that 


(K' @K)(py) = (® 2 O%p,y @ Pay 2 O% py, @w(p2,). (9.35) 


Hence, using (5.110) and (5.86), we have 


Ce(k! ® 7) 


= max H((k! @ k2)(p)) — min >) pr (kK! @ &)(px)) 
Pp (Px Px)! Dix PxPx=P 


2 max H(i! ® 6°)(p)) 


- pee Ps 102 O%p! , @ K(p1.,)) 


(Px, Px): oa PxPx= 


©) lo 2 1 2 
<A(Tro(k @k)(p)) + A(Tri(k @ &*)(p)) 
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_ min Ea(Ze: A(k Gad) + (DG r\s)) 


(Px Px)! Diy Px Px= 


SH('(Map)— min Pol (Xe Pry) 


Px Px): doy PxPx=Pp 


+ H(«? (Tr) p)) — pee Px = OH («> (p2 y)) 


(PxsPx): a Px Px= 
(c) 
<C,(K') + C.(7), (9.36) 


where (a) follows from (9.35), (b) does from (5.110) and (5.86), and (c) does from 
the relations Tr2 p = ©, py Tra py, &(Tr2 py) = Tr2t @ kl (py) = Dy Opry» and 
Tip=)> p>. Or x Then, (9.36) implies (9.5) for entanglement breaking 
channel «, and arbitrary channel k3. 


Exercises 


9.2 Using a discussion similar to (9.36), show that the additivity of minimum output 
entropy when «! is entanglement breaking. 


9.3 Prove Theorem 9.2 by referring to Theorem 4.1 and the proof of Theorem 4.2. 


9.3. C-Q Channel Coding with Shared Entanglement 


In the preceding section, we considered the effectiveness of using the input state 
entangled between systems that are to be sent. In this section, we will consider 
the usefulness of entangled states p4:? on a composite system H4 ® Hg, that is 
a priori shared between the sender and the receiver. If the sender wishes to send some 
information corresponding to an element i of {1,..., N}, he or she must perform an 
operation y, (i) on the system 71,4 according to the element /, then send the system 
Ha to the receiver using the quantum channel «. Then, the receiver performs a 
measurement (POVM) Y = {Y;}®_, on the composite system 74’ ® Hg. Note that 
this measurement is performed not only on the output system 7/4) of the quantum 
channel « but also on the composite system 714 ® H.. 

Consider the simple case in which the systems H{4, 714’, and Hep are all quantum 
two-level systems. Let the initial state p4-? be a pure state —~ a5 (luo @ ud) + \us® 


Uy BY) . Assume that there is no noise in the quantum channel, which enables the perfect 
transmission of the quantum state. In this case, we send the messagei € {0,..., 3} by 
applying the unitary transformation S/ on system 14. Then, the receiver possesses 
the transmitted system as well as the initially shared system. The state of the compos- 
ite system (C”)®” of the receiver is given by (SA @ Ip) A (lug @ ud) + |v @ u®)). 
Since the vectors form an orthogonal basis with i = 0, 1, 2,3, we can perform a 
measurement Y comprising this basis. Hence, this measurement provides error-free 
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Fig. 9.3. C-q channel coding . encoder 


ith shared 1 original decoder recovered 
with shared entanglement message message 


with noiseless channel He 
See id 


(p*")°" ‘Entangled state 


decoding. According to this protocol, two bits of information may be sent through 
only one qubit channel. We observe that by sharing an entangled state between two 
patties a priori, more information can be sent than simply by sending a quantum state 
[24]. This protocol is often called superdense coding. 

However, the amtally shared cent state is not necessarily a maximally entan- 
gled state such as —; z (lua @ ug) + lup @ut p)) in general. Hence, it is an important 
question to iiean how much a partially entangled state shared between the sender 
and the receiver improves the classical capacity. This will give a quantitative measure 
of the utilizable entanglement of a partially entangled state. 

Assume that the sender and the receiver share the partially entangled a 
(p*-8)®" on HE" @ HR". The code is then given by the set &!” = (N,, Ha, op, 
Y) consisting of its size N,,, the quantum system H a; transmitted by the ie to 
the receiver, the operation y“”) (7) from the quantum system He to H4, dependently 
of each message 7, and the measurement Y on the composite system 1. A, ® ue 
as Fig. 9.3. 

Further, the effectiveness of an entangled state p48, i.e., the increase of the trans- 
mitted message, is given by 


(n) je def Nn 


|P, : 
dim H 4: 


and the error probability is given by 


Nn 


[ors — + G-2[eH @ eorex). 


" i=l 


Hence, the amount of assistance for sending information by the state p4’? can be 
quantified as” 


def 


A,B 1 (n) = 
Ca(p*"”) = sup lim — 7, oBl®e | im e[®,; M1 =O}. (9.37) 


Then, we obtain the following theorem. 


Theorem 9.4 The quantity 1 min, Agi, ((p4-8)e") (A|B) converges as n — oo and 


?The subscript a expresses “assistance.” 
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A,B Fa as 

Ca(p'?) = — lim — min Aig. ,((p4.8y2n) (AB), (9.38) 
n>on kK 


where « isa TP-CP map from H%" to Ha [25-29]. We assume that the output system 
Ha: can be chosen dependently on k. 


When the initial state p“-? is a maximally correlated state, min, H,.9, 2 ((p4-8)an) (A|B) 
= NH, s.2(A|B), ie. Ca (p48) = —H,,.s(A|B). Certainly, this equation holds when 
condition (8.129) is satisfied. In particular, if p4- is a pure state, we have C, (p48) = 
H(p*). 


Proof We first show that 


Ca(p*") = H(p") — min H((% ® tg)(0""*)) (9.39) 


in order to obtain the > part of (9.38). Let &,, be the channel argmin,. H((K @ 
tp)(p*’8)). We denote the output system of «,, and its dimension by Hy and 


d, respectively. Now, we focus on the c-q channel (i, 7) > Wij) = (x! i, ® 
Ip)*(K @® ta)(o®) (Xi, Zi, ® Ig) with the set of input signals VY 2G Dhezi,j<a- 
Using Theorem 4.1 and Exercise 5.10, we see that the capacity of this channel is 
larger than 


1 ee ~ 3 
HD) Xp Zh ® In)" @ ta\(p* "Ky Zy @ In) 
(i,j) 


= SIGH (KZ @ Le)" @ tw)(9"*)(Kiy ZA, @ In)) 


1 


=H (pf, @ Trav ® t)(p")) — SG 


Gj) 
=H (pf, ® Trap”) — H (x @ en )(o")) 
=logd + H (Tra p**) — H ((« @ug)(p"*)). 


H ((« ® tg) (p*"*)) 


From the definition of | ©” |, we immediately obtain (9.39). Fixing n and applying the 


same argument to kK, = argmin, H((K @ v$")((p*-8)®")), we obtain C,(p4:8) > 
HA (Tra p*8) — 4 min,, H((K ®@ 12”")((p48)®")). Therefore, we have C,(p4:8) > 
H(Tra p48) — inf, + min, H((K ®@ 18") ((p4:8)®")). Since nH (Tra p4:8) — min,, 
H((K ® v8")((p*"8)®")) satisfies the assumptions of Lemma A.1, this converges 
with n — oo. We therefore obtain (9.38) with the > sign. 

Next, we prove the < part of (9.38). Let X be a random variable taking values in 
{1,..., N,} and following the uniform distribution. Let Y be the decoded message 
at the receiver as the random variable taking values in {1,..., N,}. Since H(X) = 
log N,,, the Fano inequality yields that 
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I(X : Y) > H(X) — log2 — e[@™ ] log N,, 
= —log2 + log N,(1—e[®”}). (9.40) 
Using the monotonicity of the quantum relative entropy and (5.86), it can be shown 
that Exe. 9.4 
1(X :Y) <nH (Tra pp“) + logdim Ha, — min H((K ® 13")((p*"7)®")). 

(9.41) 

Combining this inequality with (9.40), we obtain 
1, 7 is log 2 
H (Trap) — — min Ho @ 09")((p"8))) + = 


‘, log N, log dim H, , 
~ Nn 


n 


(1 — e[@!]) — 
Taking the limit n — oo, we have 
1 
H (Tra p*"*) — lim - — min Hine py") 


—__ log N,, —lo dima, 
iim g g Al 


n 
which gives the < part of (9.38). a 


We assumed above that there was no noise in the quantum channel. Since real quan- 
tum channels always contain some noise, we often restrict our channel to a given 
TP-CP map «. Now, consider the case in which the quantum channel « has some 
noise, but the sender and the receiver are allowed access to any entangled state. Let 
us also say that the quantum channel « can be used n times (i.e., k®”), as considered 
previously. 

First, we prepare an entangled pure state x”) on the composite system H a @ Hrp,, 
comprising quantum system 7,4, at the sender and quantum system 71,, at the 
receiver. Let the size of the code be N,,, and let an element i € {1,..., N,} be 
transmitted. Next, the sender performs the operation vy” (i) from the system H At 
to the other system a dependently oni = 1,..., N,. Then, the state on ao is 
transmitted to the receiver via the given quantum channel «®”. The receiver per- 
forms a measurement Y on the composite system H&” ® He, thereby recov- 
ering the original signal i. In this case, our code can be described by the set 
(Har, Hr,» x, Nn, ep, Y™), and is denoted by &{"*. Hence, the size of the 
code and its error probability are given by 


2, def 
|\oe?| = Nn, 


Nn 
ein SL (1-1[p 1) @ tanto )Y,"]). 


Nn 
i=1 
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. . . 3 . . 
The entanglement-assisted classical capacity Cf ,(K)° is given by 


Cc, 


le _ 1 : 
C? .(«) & sup flim — log || | lim e[@”?] = 0| (9.42) 
: n : n—->0oo 


Theorem 9.5 (Bennett et al. (30, 31], Holevo [32]) The entanglement-assisted clas- 
sical capacity Cf ,(&) of a quantum-quantum channel k from Ha to Hp is 


Co (Kk) = max I(p, k), (9.43) 
: p 


where I(p, &) is the transmission information of a quantum-quantum channel defined 


in (8.36). 


In a manner similar to J(p, 7, W), we define J(p, 7, &) as 


J(p,0, 6) = Tek ® ca)(Ix)(x])log(« ® ca)(Ix)(x1)) —logp @o) (9.44) 
=H (p) — Tr K(p) logo — H.(p, &) = I.(p, &) — Tr K(p) logo, 


where x is a purification of p. Then, J(p, 7, &) is concave for p because of (8.50), 
and is convex for a. Since 


J(p, 0, 6) = 1(p, k) + D(K(p)|lo), (9.45) 
in a manner similar to (4.71), Lemma A.9 guarantees that 


Ce (kK) = max I(p, ) = max min J(p, 0, &) = min max J(p, 0, k). (9.46) 
p p oa co p 


Proof We first construct a code attaining the right-hand side (RHS) of (9.43), i.e., we 
prove the > part in (9.43) for argmax, J (p, ) = piix- Let p** be the purification of 


; del eee ud 
p4.,. Perform the encoding operation using the operation p“:* +> ie 5 = (x,.Z,® 


1)(p4°®) (Xi, Zi, ® I)* at A, as in the case of a noise-free channel. Since 


1 1 
DY HEE WOES) = &OW(Y ZOE) 
Lee | ig 
=(kK ® Lr) (Dinix ® Prix) = K(Pinix) ® Das 


we obtain 


> aP(ir @ x(t) | > ai ® in )(04:4)) 
tj 


ij 


3The second subscript, e, of C «e¢ indicates the shared “entanglement.” The superscript e indicates 
“entangled” operations between sending systems. 
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1 A,R 
=D) PH ® taV(OGDIINPhtixd ® Prix) = EPhpixs 6): 
ij 


Combining this equation with the argument given in Theorem 4.1, we find a code 
attaining 1 (p4.., K). 
Now, consider the case when I (p/1;,, K) = max, I(p, &) does not hold. Let pe 


be the completely mixed state on a subspace K,, of H®”. If we can take the state pe 
such that 


: 1 K. @n 
lim —I(p,j,, >") = max I(p, «), (9.47) 
non p 
we can construct a code satisfying max, /(p, «). To choose such a subspace K,,, 
def on 
let py = argmax »1(p, &), and we take the spectral decomposition pu = pee Ajn 
Ej.n, where v, represents the number of eigenvalues of pn Let p/"’ be the com- 


pletely mixed state in the range of F,,, and let p;,, = Ajn tank Ej. Then, we have 
Pv = DojL1 Pj.nPrix> therefore, p;,, is a probability distribution. Applying (8.46), 
we have 


Un 
D> Pint (pri, KO") + 2log uv, = 1(p%", Ke"). 
j=l 
Thus, there exists an integer j,, € [1, v,] such that 
T(pinies 2") + 2log Un = I (phy, K"). 


mix ? 


From Lemma 3.9, since 2 log v, — 0, we obtain (9.47). This shows the existence of 
a code attaining the bound. 

Next, we show that there is no code that exceeds the RHS of (9.43). Given any 
pure state p“® on Hy @ He, a set of TP-CP maps {y.(/)} from Hy: to Ha, and a 
probability distribution p;, we have 


DY PID (Koge()BeRM(O*'* ID) pj (Kove BeR)(p*"*)) 
J J 
< —_ I(p, &), (9.48) 
where we used (4.7) and (8.52). From (8.47) we obtain 
max I(p, 62") =n max I(p, k). (9.49) 


Using the Fano inequality appropriately as in (9.40), we show the < part in 
(9.43)P 8, Hi 
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Next, we examine the relation between the entanglement-assisted classical capacity 
Cé.(&) and the classical capacity C,(«). Let Hg be the output system of K, He be 
the input system of «, and 7p be a reference system of 7(,4. Due to the relation 
(8.173), the quantity C’~* (p) characterizes the classical capacity C.(K) as 


C.(K) = ny CRF ((K @ er) (\x) (x). 


Hence, from (8.178) 


CRB (Ke @ er)([x)(x|)) < [c@raydayey(R : B), (9.50) 


Ce(kK) < CE (kK). (9.51) 


For the equality condition, the following lemma holds. 


Theorem 9.6 When channel x is entanglement breaking and is written by a CONS 
{u*} on Ha as 


K(p) = Do (uA lous) p?, (9.52) 


i 


the equality of (9.51) holds. Conversely, when the equality of (9.51) holds, the channel 
essentially has the form of (9.52), i.e., there exists a state Pmax Such that I (pmax, K) = 
Ci, (K) and K| supp(pmax) aS the form of (9.52), where supp(Pmax) is the support of Pmax- 
Further, in the case of (9.52), the classical capacity is calculated as 


Ci. o(ts) = max H (x ro) — > iH i). (9.53) 


Notice that the channel &| supp(p,,,,) is HOt necessarily the same as the channel x. Indeed, 
there exists a counterexample « such that the channel « does not have the form of 
(9.52) while the equality of (9.51) holds, hence, the channel & | supp(ma,) Has the form 
of (9.52) °8. 

From Theorem 9.6, we see that even if the channel « is entanglement breaking, the 
equality does not necessarily hold, i.e., itis advantageous to use shared entanglement. 
This is because an entanglement breaking channel does not necessarily have the form 
(9.52). For example, we consider an entanglement breaking channel « with the form 


w(p) = >) (Tr Mip)|u?)(u? |, (9.54) 


where {u?} is a CONS on Hg and M = {M;} isa POVM one rank on 14. Then, the 
classical capacity Cf ,(K) 1s calculated as 
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Cé (xk) = sup H(p) + u(Som plu) 
p F 


U 


_ a (Sac ® Ir|x)(x|) ® mv) 


L 


= sup H(p) = logda, 
p 


where |x) (x| is a purification of p. However, when the POVM M is given as 


1 1 1 1 
Mg |O0|s Mya sO Mo rls Ma Is (9.55) 


the classical capacity without shared entanglement is calculated as **°7 


C.(K) = Ce(K) = 5 log 2. (9.56) 


Proof of Theorem 9.6 Assume that condition (9.52) holds. Let Ug, ,___. ie be defined by 


Us = Dai ei |u) (uA|, @ = (@1,..., Aa,). Then, the channel « has the invariance 
k(p) = K(UgpU; ). Hence, I(p, k) = I(UgpUG, &). From (8.45) 


I(p,«) <1 4 UgpU; a0, °) : 


Since [ UgpU;d9 has eigenvectors {us}, we have 


Cé.(k) =supH (>) pilus) (uf | + AL OAL SE pile) uA | lua) 
7 j j 


i 


- u(Som lu) (us| ® Telx)(x|) ® a) 
= sup H(p) + u( re) - u( pilus )(uf | ® a) 
P - i 


t 


= sup H(p) + u( ro) =p) = > piH (pi) 
P F 


wip ro) = DPA (pi): 


where |x) is a purification of a Pj |u‘*) (u} |. Hence, we obtain (9.53). In particular, 
the classical capacity is equal to C,(«). That is, the equality of inequality (9.51) 
holds. 
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Next, we assume that the equality of (9.51) holds. Then, there exists a state Pmax 
such that I (Pmax, K) = Cf ,(«) and its purification |x) satisfies the equality in (9.50). 
Lemma 8.14 guarantees that there exist a CONS {u®} on 7p, states p; on 71g, and 
a probability distribution p such that 


D> pilu®) (uk | @ oP = (« @ a) (Ix) (x). 


Now, we let p* be the reduced density of |x) (x|. Using relation (5.9), 
-1 -1 
(|supp(om) ® £R)(|Pa) (Pal) = > dpi(Vo® uF) (uF lV p® ) @ p?. 


-1 -1 
where d is the dimension of supp(Pmax)- Since >); dpi/p® |u®)(uf|/p® — is the 
completely mixed state on supp(Pmax), each u*® is an eigenvector of p* with the 


i 
eigenvalue q;. Hence, 
eds 


L 


(K|supp(pna) @ &R) (Pa) (Pal) = >) uf) (uf | ® p?. 
The discussion in Theorem 5.1 guarantees that the channel « has the form 


(9.52). | 
Exercises 


9.4 Show (9.41) using (5.86) and the monotonicity of the quantum relative entropy. 


9.5 Show (9.48) using (8.52) and the inequality D(>); pj(K o ye(j) ® tr)(p4"8)|| 
>) Pilko Ge(J)(o*) @ p®) = 0. 


9.6 Show that the < part of (9.43) by combining (9.48) and (9.49) with the Fano 
inequality. 


9.7 Show that the channel « defined by (9.53) and (9.55) satisfies the equation (9.56) 
following the steps below. 
(a) Show that C.(x) =log4— ming f(0), where f(0) = 140088 Jog Host 


l—cos 0 l—cos 0 1+sin 0 1+sin 0 1—sin 0 l—sin 0 
ri log ri Z log Z 7 log ree 
df __ sing l+cos 0 cos 0 1—sin@ af _ cosé 
(b) Show that 3,(0@) = “> log =o + Fp log pang? and Fp O=4 
1+cos 0 sin? 1+sin 0 
log l—cos 0 a 4 log 1—sin 0 ap, 


(c) Show the following table and the equation (9.56). 


0 0 


AIA 


8 = 
F@) |} log2|/7] 4 log "5 + 22? log 8 \.| 3 log2 


af 
70) | 0 |+ 0 =| 


qaz (P) oo |X —1 7 | © 
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9.8 Let &; and Kz be channels from systems H,4,; and H,4,2 to system Hz, respec- 


tively. Assume that the states pmax,1 po argmax,, T(p, 1) and Pmax,2 = argmax , 
I(p, 2) satisfy that 1 (Pmax.1) = K2(Pmax.2). Define the channel « from the sys- 
tem 741 ® H4.2 to the system 7H, as K(p) = Ki (P) pP,) + K2(P2pP2), where P; is 
the projection to H,4 ;. Show that Coe (K) = max(C¢ , (K1), Cee (K2)) by using (9.46). 

Further, show the equality of (9.51), i.e., C.(«) = Cf,(«), even though K2 does 
not satisfy (9.52) if «1 satisfies (9.52) and Cf (Ki) = CE, (k2). 


9.4 Quantum Channel Resolvability 


In this section, we examine the problem of approximating a given quantum state 
on the output of a c-q channel. In this problem, we choose a finite number of input 
signals and approximate a desired quantum state by the average output state with the 
uniform distribution on the chosen input signals. Then, the task of this problem is to 
choose the support of the uniform distribution at the input system as small as possible 
while approximating the desired state by the average output state as accurately as 
possible. 

The classical version of this problem is called channel resolvability. It was pro- 
posed by Han and Verdti [33, 34] in order to examine another problem called the 
identification code proposed by Ahlswede and Dueck [35]. The problem of approx- 
imating a quantum state at the output system of a c-q channel is analogously called 
quantum-channel resolvability. Hence, quantum-channel resolvability is expected 
to be useful for examining identification codes [36] for (classical-) quantum chan- 
nels. Indeed, this problem essentially has been treated by Wyner [37] in order to 
evaluate the information of the eavesdropper. Hence, it is also a fundamental tool 
for the discussion of communications in the presence of an eavesdropper for the 
following reason. Regarding the channel connecting the sender to the eavesdropper, 
approximating two states on the output system is almost equivalent to making these 
two states indistinguishable for the eavesdropper. Its detail will be discussed in the 
next section. 

Quantum-channel resolvability may be formulated as follows (Fig. 9.4) [33, 34]. 
Consider a c-q channel W : Y — S(H) and a quantum state 0 € W(P(1)), and 
prepare a map yy from {1,..., M} to the alphabet set V. Now, the sender chooses 
an element i of {1,..., 4} according to the uniform distribution and sends the state 
W.«). The problem is then to determine how many (M) elements are required for 


Fig. 9.4 Channel wo 


resolvability Dp — an ) 


; ia u , 
5. —$$<—> Ww 
Ly ss) F M eo) 
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sufficiently approximating the quantum state o by the output average state W, = 
a pay W..;) of the c-q channel W. (Here, we are allowed to use input elements 
duplicately.) The quality of the approximation is evaluated by the trace norm ||W,, — 
o||1. Here, we choose the trace norm as the criterion of the approximation because 
it represents how well two states can be discriminated, as seen in Lemma 3.2. If the 
number M is sufficiently large, we can easily approximate the state W, = o by the 
output average state W,,. However, our aim is to approximate the state W, = o with 
a small number M. One of the features of this problem is the following; Even when 
the distribution p(x) = #{y~!{x}}/M at the input system is not close to p, the state 


ao = W, can be approximated by W,, = W, using the noise of channel W. In this 


case, our protocol is represented by = (M, yp), and its performance is by M = |®| 


and e[a, ®] = 


| (Gi = Wo a) —o | . Here, we consider the performance of the 
1 


approximation in the worst case as Max pep(x) MiNg:|¢|=M e(we, @]. Then, the 
asymptotic rate of its performance is given as the quantum-channel resolvability 


capacity*; 


ciwy= {. 


lim sup inf e(w™ o1=o}, (9.57) 


; 
NO pep(xn) :|G|=e"® 


Theorem 9.7 The quantum-channel resolvability capacity C,(W) satisfies 


C.(W) < C.(W) = sup I (p, W). (9.58) 
Pp 


To show Theorem 9.7, we prepare two lemmas as follows. 


Lemma 9.2 For a given state o, a distribution p on the set X, and a real number 
s € [0, 1], there exists a map ¢ from {1,..., M} to & satisfying 


l M 
i=l 


<4 pS p(x) Tr We{k(Wy) = Mo} 


1 


+ Apes Tro—!k,(W,)?{Ko(W,) < Mo} (9.59) 
<max(4V/2, /2v)M~ 262717"), (9.60) 
where v is the number of eigenvalues of o, E, denotes the expectation under the 


distribution p, and Kg is the pinching map concerning the matrix o, which is defined 
in (1.14). 


4The subscript r indicates “resolvability”. 
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Lemma 9.2 will be shown after the proof of Theorem 9.7. Using Lemma 9.2, we 
can show the following lemma. 


Lemma 9.3 For a given state o, a distribution p on the set X, and a real number 
s € [0, 1], there exists a map ¢ from {1,..., M} to & satisfying 


sup inf e[W,, ®] < max (4v2, v2») Mai e@3Cis), (9.61) 
pEeP(X) ®:|P|=M 


where v is the number of eigenvalues of 0\+5|p,,,. Remember that p45 and 01+5\p 
are defined in (4.62) and (4.23), respectively. 


Proof of Lemma 9.3 We apply Lemma 9.2 to the case with o = 0145|p,,,. Hence, we 
obtain 


srge eel Se (4v2, V2») Mo8/? eh di+ (Parsing WY, 


Taking the supremum for p, we have 


sup inf e[W,, ®] 
peP(X) P:1P|=M 


< max (4v2, : /20) M73/2¢3 SUP, eP(X) Tiss (Ps F4s\pp45>W) 


© max (4v2, V2») Mo8/2 93 Cts) 


where (a) follows from (4.74). | 


Proof of Theorem 9.7 Assume R > C.(W), and choose M = e”® Now, we denote 
the state 7145|p,,, for the channel W™) by o. Since the additivity of C 4 .(W) 
(4.76) implies Cy,,(W”) = nC}, ,(W), applying Lemma 9.3 to the channel W, 
we have 


sup inf — e[W,”, ®] < max (4v2, 20») oin(Ct,.(W)-R) joes 
peP(an) P:|P|=er® 


for s € [0, 1]. Then, due to the discussion in the solution of Exercise 4.74, we find 
that 0”) = (o))®", Hence, the number v, increases only polynomially. Therefore, 
the RHS of (9.62) goes to zero exponentially. a 


Proof of Lemma 9.2 We prove (9.59) and (9.60) by employing the random coding 


method. Let X = (x1,...,Xm) be M independent random variables subject to a 


probability distribution p in 4. Consider a protocol (M, vy) such that y(i) = x;. 
Denoting the expectation by Ex, we will show that 


1 M 
(Gi 2 ¥) 7 a 
i=1 


Ex 


1 
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<4 2 P(x) Tr Wefko(Wr) > Mo} 


v 
+ Ja? Tro! Kg (W)?{Ko(Wx) < Mo}, (9.63) 
< max (4v2, V2) Mo 2 @341+s(P.0.W) (9.64) 


Now, define P,  {x:w,(W.) = CW}, Po & 1 — Py,andW), © >, p(x) Pow, PE. 


Exercise 6.8 implies 


|| We Pella < J Tr Wy Py (9.65) 


Il Px, We, Pella < (Tr Py, We,) (Tr W,, Po) < VTr Wy Pr. (9.66) 


Xi" Xx 


Since W, — W, = >, P(x) (Wy, Px + Pr, Wx, PS), we have |W, —W,ll <>, 


P(x) We, Pella + (Px; Wx; Pélla) < 20, p(x)J/Tr Wz Pr < 2,/>°,, p(x) Tr W, Pr 
= 2,/E, Tr W, P,. Thus, 


M 
Ex (ii » Ws) -W, 


1 


M 
1 rag / 1 c € 
=Ex (G2 Py + Pr, Wr, rs) + (Wi — W,)+ (a > PW, P:) — Wi, 
i=1 


1 


M 
<Ey : >) (PE Ws, PS — Wi) 


M Xi* x; Dp - 2¥ E, Tr Ww, Py 
i=l 1 
iw 

+ Ex M IW, Pl + 1 WPS 
i=l 
iw 
<Ey Mi (PE W,, PE — Wy) +4/Ex Tr Wy Pr. (9.67) 

i=1 1 


Thus, Exercise 6.10 yields 


M 
1 peek 
ee) 
i=1 


1 


1 . 2 1 . 
< Tra!/2 Gi 2 Pe Wy; Re ~~ ws)o 1/2 (i x Py, Wx; Py, ~~ n ‘ (9.68) 
i J 


Since the random variables x; are independent of each other, 
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M 
1 
Bro ( a > A Fs w, Je" pe eens 


I 
=x gly Tho PEW, Pea o'? Po W,, Pe — Tra? Wiha we 
i=1 
1 
SBy Tea PEW Poa PWG Pe (9.69) 
1 
<q Be Teo? Pru tig (We) Peo! PEW Pe 


1 : 
= geet oT? Pon (Wy) Poa? Pog (Wx) PS 


= Gis Tro ko (Wy)? {Ko (We) < Mo}. (9.70) 
Thus, using these relations as well as Jensen’s inequality for x > —./x, we obtain 


M 
1 c Cc if 
we 2 ( (Pé W,, Po — Wi) 


1 


M 
0) 
Ex | pp DL Thon ho (Ws,)*{o(Ws,) < Mo} 


i=1 


M 


< | Ex gg Tron 'o(Ws,)*{ko(Wa,) < Mo} 
i=1 


<| 7E. Tr Oo Kg (Wy)? {Kio (We) < Mo}. 


Since /a + Vb = 2(./a/2 + Vb/2) < 2./a/2 ¥b/2 = /2(a +B) for a, b > 0, 


-((bd")-* 


<4/E, Tr Wy {ko (Wy) = Mo} + aes Tr oO Kg (Wy)? {Kio (Wy) < Mo} 


1 


v 
<,|32E, Tr W,{ko(Wx) => Mo} + gE: Tro7! Kk (Wy)? {ko (Wy) < Mo}. 


Since 


32E, Tr We{k¢(Wy) > Mo} + 2a E. Tro Kio(We)? {Ko (Wx) < Mo} 
<32M~°E, Tr 0° kg (W,)!* {5 (Wy) = Mo} 
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+ 2uM~SE, Tro kg (Wy)! {ig (Wy) < Co} 
<max(32, 2v)M°E, Tro °K, (W,)!*° 


= max(32, 2v)M~Se%145(P--%oW) — max(32, 2v)MS e845 (P Ko (0), Ko (W)) 


(a) 
< max(32, 2v)M-*e% 457-79) 


where k,,(W) is the c-q channel x > &,(W,), and (a) follows from (5.60). Hence, 
we obtain (9.64). a 


When channel W is a classical channel and the map pt> W,, is one-to-one, 
C,(W) is equal to C.(W). To see the detail of this fact, we discuss the relation 
between the identification codes and the channel resolvability. Ahlswede-Dueck 
[35] introduced the identification capacity C;(W) as the upper limit of the rate 
of identification codes, and showed that C;(W) = C.(W) for the classical case. 
Han-Verdti [33] tackled the strong converse capacity C;(W)* for identification 
codes. For this purpose, they introduced the channel resolvability, they showed that 
Ci(W) < C,(W) and C,(W) < C.(W) for the classical case. The combination of 
these relations yields that C;(W) = ci(w) = C,(W) = C,(W). The same relation 
can be expected. Now, we consider the c-q channel W when the map p +> W, is 
one-to-one. The proof for C i (W) < C,(W) by Han-Verdt [33] is still valid even for 
the c-q channel. Using the general method by Han-Verdt [33], we can show that 
Ci (W) > C;(W) = C.(W) for the c-q channel. Hence, combining Theorem 9.7, we 
can show C;(W) = ci(w) = C,(W) = C,(W) even for the c-q channel.” Finally, 
we prove the direct part of Theorem 8.13 by using quantum-channel resolvability. 


Proof of Direct Part of Theorem 8.13 Choose a probabilistic decomposition (p,, p4 ® 
p®)yex of the separable state p as 


C(p) = Ipane(AB: E), p*®® 2 > pio @ p? @ |uF (uF. (9.71) 


For any € > 0, we let M, = e”©+9, Due to Lemma 9.2, we can choose M,, indexes 
yl), ..., p(M,,) in &” such that 


M, 
1 
s(n) B,(n) @n 
a @ pgiy —P | > 9, 
i=1 


0 

which implies the direct part of Theorem 8.13. a 
Exercises 

9.9 Show that 

—loge[W2", 6] 


n—>0o n 


log |e™| 
lim —2——— < R 
noo n 


5 Ahlswede-Winter [36] also showed that C;(W) = ci (W) = C,(W) ina different way. 
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AY 


= max 5(R — T},,(p, W)) (9.72) 


s<0 
by using Lemma 9.2. 


9.10 Assume that all the output states W, commute. Show that there exists a map 
yp: {l,...,M}— & such that 


1 M 
i=1 1 
<2 >> p(x) Tr W.{W, — CW, >o}4+,/£ 
= - d J po M 


as a modification of (9.59) of Lemma 9.2. 


9.5 Quantum-Channel Communications 
with an Eavesdropper 


9.5.1 C-Q Wiretap Channel 


The BB84 protocol [38] enables us to securely distribute a secret key using a quan- 
tum system. Experiments realizing this protocol have been performed [39, 40], with 
successful transmissions over 150 km [41, 42] via optical fibers. Therefore, the pro- 
tocol is almost at a practically usable stage. In the original proposal of the BB84 
protocol, it was assumed that there was no noise in the channel. However, a real 
channel always has some noise. In the presence of noise, the noise can be used by an 
eavesdropper to mask his/her presence while obtaining information from the channel. 
Therefore, it is necessary to communicate on the assumption that an eavesdropper 
may obtain a certain amount of information. 

This type of communication is called a wiretap channel and was first consid- 
ered by Wyner [37] for the classical case. Its quantum-mechanical extension, i.e., 
a classical-quantum wiretap channel (c-q wiretap channel ) was examined by 
Devetak [43]. In this communication, we require a code such that the authorized 
receiver can accurately recover the original message and the eavesdropper cannot 
obtain any information concerning the original message. Hence, one of the main 
problems in this communication is to find the bound of the communication rate of 
the code. Although this problem is not the same problem as the BB84 protocol itself, 
it will lead us to a proof of its security even in the presence of noise. 

Let Hg be the system received by the authorized receiver, 71, be the system 
received by the eavesdropper, and W, be an output state on the composite sys- 
tem 71g ® 7H, when the sender sends an alphabet x € ¥. Hence, the authorized 
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Fig. 9.5 Wiretap channel w? 
Alice Bob 


Eve 


. - def F 
receiver receives the state a = Trg W,, and the eavesdropper receives the state 


We = Trg W, as Fig. 9.5. In this case, we use a probabilistic code as follows. 
When the sender wishes to send a message m € {1,..., M}, he or she transmits the 
alphabet x € ¥ according to the probability distcibution Q” in & dependently on 
the message m. That is, the encoding process is described by a stochastic transi- 
tion matrix Q from {1,..., M} to ¥. Then, the authorized receiver performs the 
M-valued POVM Y = {¥in/}%_, and receives the signal m’. Therefore, our proto- 
col is described by ® = (M, Q, Y) and evaluated by following three quanti- 
ties. The first quantity is the size of the protocol \o| © = M, and the second is the 
error probability of the authorized receiver e[®] a qT ei yl — Tr(W2 0)» Ya 


where (W? Q),, = yA ex ws Q”” . The third quantity is the upper bound of the eaves- 


dropper’s information Iz (®) g 1 (pM... W* Q), where (W* Q)m s > WoO; 
Instead of Iz(®), we often employ d;(®) := min, eo aT (WE O)m — olli. 
Let us now examine the wiretap channel capacity, i.e., the bound of the com- 
munication rate + log |®|, for the asymptotically reliable protocol {”} with the 
stationary memoryless channel W”, i.e., n times use of W. This is given by 


e ae 
cB E(w) “ sup {tim “tog 
{@M} n 


e[D™] > 0, 1e(@) > OF. 9.73) 


Theorem 9.8 (Devetak [43]) The wiretap channel capacity CB" (W) satisfies 


== 
C2?" (W) = lim — sup sup (I(p, W°" Q) — I(p, W*" Q)). (9.74) 
nQ p 


If every WF can be written as WE = k(W8), using a completely positive map 
from 71g to 7g, it is called a quantum degraded channel, and it satisfies 


C2*(W) = sup (I(p, W*) — I(p, W*)). (9.75) 
Dp 


It is also proved in Sect. 9.5.6. Further, a quantum degraded channel (W?, W“) 
satisfies Exe. 9.19 
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I(Qp, W®) — 1(Qp, W") = > pi(1(Q', W®) — 1(Q', W*)). (9.76) 


That is, 1(p, W?) — I(p, W®) satisfies the concavity in this case. 

Let us suppose that W? is given by aTP-CP map « from H, to H3, and the channel 
to the eavesdropper W® is given by a channel «¢ to the environment of «. Under 
this assumption, the eavesdropper’s state is always a state reduced from the state on 
the environment. That is, he/she has less information than the environment system. 
Hence, the eavesdropper’s information can be sufficiently estimated by treating the 
case where the eavesdropper’s state is equal to the state on the environment. Now, 
we consider the set of input signals Y given as the set of pure states on the input 
system. Then, for any input pure state |x) in 71”, the states W?” and WE are 
given by «®"(|x)(x|) and Ke" (|x) (x|), respectively. In this scheme, any entangled 
state is allowed as the input state. From H (w2 )=H (WE ) and (8.54), any state 
p = >; pilui)(ui| satisfies 


I(p, W®) — I(p, W®) 


=H(K(p)) — >) prH (We) - (scr onou -> pn) 
=H (K(p)) — H (Tte(U,pU%)) = .(p. f). (9.77) 


Hence, letting C’-?:“ (x) be the asymptotic bound of the secure communication rate 
when any state on 1%” is allowed as an input state, we can show that®**" 


Co Fc) > lim : max I,(p, K®" 
c 2 c(p, Kk"). (9.78) 
n>o n peS(H&") 


In addition, the following monotonicity also holds with respect to the eavesdrop- 
per’s information: 


T({pi}, {he oO K(p)}) ST pi}, (Kone (pi), (9.79) 
T({pi}, {Ke (pi)}) S T({pi}, (4! oR) (pi). (9.80) 


9.5.2 Relation to BB84 Protocol 


Let us now relate these arguments to the BB84 protocol discussed earlier. In the BB84 


; def 
protocol, the sender A transmits a state chosen from ep, e1, e+ = (eo + e,), and 


c= (€9 — e1) with an equal probability. The receiver B then chooses one of the 
two measurement bases {|eo) (eo|, |e1) (e1|} and {le+)(e+]|, |e_) (e_|} with an equal 
probability and performs this measurement on the received quantum system. Then, 
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the authorized receiver B sends his/her measurement outcome to the sender A via 
a public channel. The sender A tells the authorized receiver B whether the original 
state belongs to the set {e9, e;} or {e,,e_} via a public channel. This determines 
whether the basis used by the sender A coincides with the basis by the authorized 
receiver B. The bases should agree for approximately half number of the transmitted 
states, which is numbered by n. They choose en bits randomly among these obtained 
n bits and announce the information of these en bits using the public channel in 
order to verify whether these bits coincide with each other (€ is a suitably chosen 
positive real number). When they find a bit with disagreement, the sender A and the 
authorized receiver B conclude that an eavesdropper was present. Otherwise, both 
parties can conclude that they succeeded in sharing a secret key X without divulging 
information to any third party. Finally, the sender encrypts the information Y,4 to be 
sent according to the conversion Z = X + Y, (mod 2). The encrypt message Y may 
be decrypted according to the conversion Yg = Z + X, thereby obtaining secure 
communication. 

In reality, the bits held by A and B may not agree due to noise even if an eavesdrop- 
per is not present. In this case, we must estimate the quantum channel « connecting the 
sender to the receiver is partially leaked to the third party. Consider a case in which 
the sender sends bits based on the basis {é9, e;}, and the receiver detects the bits 
through the measurement E = {|e;) (e;|} _— Now, let X 4 and Xz be the random bits 
sent by the sender and the random bits detected by the authorized receiver through the 
measurement, respectively. When the sender transmits bit i, the authorized receiver 
obtains his/her outcome subject to the distribution PE a By performing the commu- 
nication steps as described above, the stochastic transition matrix Q joining Y, and 
Yz is given by 


1 1 1 1 
O59 = 2) = sPieo) 0) + sPrrey Ds P= OQO= sPre)(D) + sPrcey 


which is the same as that for a noisy classical channel. Using a suitable coding 
protocol, the sender and the authorized receiver can communicate with almost no 
error and almost no information leakage. 

Let us now estimate the amount of information leaked to the eavesdropper. In 
this case, it is impossible to distinguish the eavesdropping from the noise in the 
channel. For this reason, we assume that any information lost has been caused by the 
interception by the eavesdropper. Consider the case in which each bit is independently 
eavesdropped, i.e., the quantum channel from the state inputted by the sender to 
the state intercepted by the eavesdropper is assumed to be stationary memoryless. 
Therefore, if the sender transmits the state e;, the eavesdropper obtains the state 
KE (le;)(e;|), where &¢ was defined in (5.7). Since the eavesdropper knows Z, he/she 
possesses the state on the composite system Hg @ C? consisting of the quantum 
system Hg and the classical system C? corresponding to Z. For example, if Y4 = i, 
the state W,” obtained by the eavesdropper is 
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we = eo 0 ) 


0 + KE (le1)(e1) 
1 
ge _ ( x8e(le1) (er) 0 ) 
mi =i 0 —-Anz(leodteol) } 


We may therefore reduce this problem to the c-q wiretap channel problem discussed 
previously [44]. In particular, if « is a Pauli channel «.,, 


T(Dmix; Q) it T(Dmix, w*) = log 2 baa H(p), (9.81) 


which is a known quantity in quantum key distribution [45]. 

In practice, it is not possible to estimate « completely using communications that 
use only eo, e1, e+, e_. However, it is possible to estimate J (p, W®).© Since the 
encoding constructed in the proof of Theorem 9.8 depends on the form of W®%, it is 
desirable to construct a protocol that depends only on the value of J(p, W“). 


9.5.3 Secret Sharing 


Let us consider an application of the above discussion to a protocol called secret 
sharing. In secret sharing, there are m receivers, and the encoded information sent 
by the sender can be recovered only by combining the information of m receivers. 
Therefore, a single receiver cannot obtain the encoded information [49, 50]. 
Denote the channel from the sender to each receiver by W [51, 52]. The transmis- 
sion information possessed by one receiver is equal to [(p, W). The transmission 
information possessed by m receivers is therefore mI(p, W). Theorem 9.8 guar- 
antees that performing the communication n times, the sender can transmit almost 
n(m — 1)I(p, W) bits of information with no leakage to each receiver. That is, 
the problem is to ensure that the information possessed by an individual receiver 
approaches zero asymptotically. The random coding method used in the proof of 
Lemma 9.4 may be used to show the existence of such a code. Let J;(®x) be the 
information possessed by the ith receiver for the code ®y. Let e[® x] be the average 
decoding error probability of combining the m receivers. Then, Ex[e[®x]] satisfies 
(9.93), and it can be shown that )°”"_, Ex[Ui(®x)] < m(e2 logd + o(e2)). There- 
fore, Ex[e[®y]] satisfies (9.93), and we can show that there exists a code ® such 
that <[®] + 0", I:(@x) < 1 + m(e2 logd + mo(e2)). Therefore, it is possible to 
securely transmit n(m — 1)I(p, W) bits of information asymptotically. Further, we 
can consider the capacity with the following requirement: There are m receivers, and 
the information can be recovered from composite quantum states by any 1, receivers. 
However, it can be recovered not only by nz receivers. In this case, the capacity is 


: def : def : : Pe 
®By adding the states e% = (C0 +ie,) and e* = Ja (e0 —ie}) in the transmission, and by 


adding the measurement {|e% )(e%|, |e*)(e* |}, it is possible to estimate x. This is called the six- 
state method [46-48]. 
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(n; —1n2)C(W). It can be shown by the combination of the proofs of Corollary 4.1 
and Theorem 9.8. 


9.5.4 Distillation of Classical Secret Key 


In addition, this approach can be applied to the distillation of a classical secret 
key from shared state p on the the composite system 714 ® Hg ® He as Fig. 9.6. 
Although we discussed a related topic in Sect. 8.14, the discussion in Sect. 8.14 
considers only the information leakage, i.e., assumes that the information on the 
system 71, is the same as that on the system 7/3. In this subsection, the information 
on the system 7/4 is not necessarily the same as that on the system 7/,, i.e., there 
might exist a noise between the two systems H, and 7,. 

In the distillation of a classical secret key, it is our task to generate a secret uniform 
random number shared by the two systems 7/4 and 7/g. That is, it is required that 
the eavesdropper’s system 7{¢ cannot hold any information concerning the distilled 
random number. Then, the optimal key rate with one-way (A — B) communication 
is defined by 


Crt (p) “ sup } lim 


Kn 


{i log Ly 


| Tre Kn (Pn) = Pmix,L, ll => 0 , (9.82) 


nN |Ar,(p,) (AB: E) > 0 


where Pmix,, = T > /, leA) (eA|. For this analysis, we define the quantity C4 (p): 
Cr we 
def , : 
= max(H (p") — DP pt (i)H (p?) — H(p*) + DP pt (i)H(pF)). (9.83) 


From this definition, the quantity C as ~ (p) satisfies the monotonicity concerning 


the one-way A — B operation. Further, we can show Condition E2 (continuity) 
similarly to C4~ 8 (p). 

Using Theorem 9.8 and a discussion similar to Theorem 8.10, we obtain the 
following theorem. 


Theorem 9.9 (Devetak and Winter [53]) 


Fig. 9.6 Tripartite state 
Alice =f — Bob 


Eve 
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ee) 


cé>?--(p) = lim —4 (9.84) 
noo n 
Further, if there exists a TP-CP map « from Hg to He such that 
Trap P(M ® Ipe) = K(Trag p(M @ Ipe)), VM = 0, (9.85) 
we have 
C.F py Po), (9.86) 
In particular, when p has the form p4? @ p®, 
Cer PE (p) = Cg? (9%). (9.87) 
Proof First, we prove the direct part: 
A>B-E en 
CA>8-E(p) > lim —4 ee), (9.88) 
noo n 


For this purpose, we consider the following operation. Let M be a POVM on 74 
attaining its maximum on (9.83), and {1,...,/} be its probability space. First, we 
define the channel W?, W® as the sender prepares the classical information j € 
{1,...,/} and perform the measurement M on 74. Hence, the sender obtains the 
datum i as its outcome. He sends the classical information k = i + j mod /. Then, 
systems B and E receive this information k. Since the channel W2, W® is described 
as 


WP = > OPM @o? @ lessens, WP = > PM@oF @ leirseisil, 


we obtain 
I (pmix, W*) = 1PM, p?) + A (pmix) — HP), 
I (pmix, W") = 1PM, p?) + A (Paix) — (PM). 
Hence, Theorem 9.8 yields 
Creo rg: 


Thus, we obtain (9.88). 
Next, we prove the converse part: 


CASB-E : pe pe) 
ASB-E (9) < Jim Cd SP) (9.89) 


noo n 


9.5 Quantum-Channel Communications with an Eavesdropper 523 


As was mentioned above, the quantity C ad B~E (o) satisfies the monotonicity and 
the continuity. Hence, from a discussion similar to that concerning Theorem 8.10, we 
can show inequality (9.89). Further, we can prove (9.86) based on a similar derivation 


as for (9.75). a 


9.5.5 Proof of Direct Part of C-Q Wiretap Channel Coding 
Theorem 


We consider the attainability of the RHS of (9.74). Given a map y from {1,..., M} x 
{1,..., L} to V, we define the distribution Q” by pa + 50m)» where dy(m,1) iS 
the deterministic distribution taking values only in {y(m, /)}. Then, for a POVM 
Y = {Yn,p}in,y, we denote the code (M, Q, Y) by ®(y, Y). Now, let us examine 
the following lemma. 


Lemma 9.4 Given a distribution p, let v be the number of eigenvalues of Wr . Define 


def = a 
é, = min 2'*°(ML)‘e82-"-¥) 
se(0, 1] 


def : _s st E 
= min max (4v2, V2) L-2e2%i+s(P,W") 
se[0, 1] 


for integers M and L, where v is the number of eigenvalues of 0\+5\p,,,- There exist a 
map from {1,..., M} x {1,..., L} to & anda POVM Y = {Yin }an,1) such that 


[P(y, Y)] <3, di(Ply, Y)) < 3e2, (9.90) 
Te(®(p, Y)) < 3(e2 logd + mo(€2)). (9.91) 


When we only focus on €[®(p, Y)] and d\(®(y, Y)), there exist a one-to-one map 
y from {1,..., M} x {1,..., L} to © and a POVM Y = {Yann }an,t) such that 


(Py, Y)] < 2a, di(Ply, Y)) < 2e. (9.92) 


Proof Apply the random coding method. That is, for each pair (m, /), let p(m, 1) be 
given by the independent and identical random variables x,,,; subject to the probability 
distribution p. Using the random variable X = (x,,,1), we denote y = (y(m, 1)) by 
yx. Hence, this protocol is determined by X and is denoted by dy = P(yx, Yx). 
Denoting the expectation by Ex, (4.53) and (4.49) yield 


Ex [e[®x]] <e1. (9.93) 
This is because the error probability can be reduced further than the case when ML 


messages are transmitted since only M messages ee Wom) }m are transmitted 
and decoded. Also (9.64) with o = 0 45|p,,, yields 
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M 1 1 L 
E E 
Exd\($x) < Ex }) alle > Wem — Well < ©. (9.94) 
m=1 


2M L 
I=1 


Applying Exercise 5.36 to (9.60), we immediately obtain 


1 
Ex | Daw Ox)nll | < & logd + m(e). 


Thus, 


a | =i 
Ex [Ie(®x)|=Ex (> a 20m’ -> On) 


m’=1 m=1 
E 
mo ) 


eed “1 
= Ex > ae «vF Oval) —Ex (> ap Ws Oxm 
<e2 logd + (2). (9.95) 


n=1 m=1 


Since 


Pr{3Ey [e[®x]] < e[Px]} U (3Exd)(®x) < di (Px)} 
U {Ex Ue(@x)] < le(®x)} 
<li, (9.96) 


the above evaluations prove the existence of a protocol @ satisfying (9.90) and (9.91). 
Replacing the role of (9.96) by the inequality 


Pr{2Ex [e[@x]] < e[®x]} U {2Exd| (Px) < di(Px)} < 1, (9.97) 
we can show the existence of a map y from {1,..., M} x {1,..., L} to Y anda 
POVM Y = (Yim, timp Satisfying (9.92). | 


We now prove the direct part by applying Lemma 9.4 to W2™, W“-™, and p”. 


First, we define R @ I(p, W8) — I(p, W£) and R, = I(p, W£). Let M = M, © 


et R-3) C=C, & er ®t9, and L = L, & e R42) for arbitrary 6 > 0. In this 
case, we denote the €,, €2, d by ee ‘ a , dy, respectively. 

We show that the RHSs of (9.90) and (9.91) converge to zero under the above 
conditions. Since M,L, = e"?.¥")-°), the discussion in Sect. 4.5 guarantees that 
en converges to zero. Using an argument similar to that in Sect. 9.4, we observe that 
e approaches zero exponentially. Since d,, is the dimension, the equations log d, = 
log dim H®”" = nlog dim H hold. Hence, the quantity «{” log dim H®" + (e$”) 
converges to zero. We can therefore show that C2“ (W) > I(p, W®) — I(p, W®). 
Finally, replacing W? and W* with W2” Q and W*: Q, respectively, we can 
show that C2-#(W) > 4 (I1(p, W2-™ Q) — I(p, W® Q)), which implies that the 
RHS of (9.74) > the LHS of (9.74). 
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9.5.6 Proof of Converse Part of C-Q Wiretap Channel Coding 
Theorem 


We prove the converse part of the theorem following Devetak [43]. Consider the 
sequence of protocols @®”™ = (My, On, Yn). Let X,, be the random variables taking 
values in {1,..., M,,} subject to the uniform distributions p!',,. Let Z,, be the random 


variable corresponding to the message decoded by the receiver. 

Then, the equations log M, = H(X,) = I(X, : Z,) + H(X, : Z,) hold. The 
Fano inequality yields H(X, : Z,) < e[O™] log M, + log 2. Wecanevaluate [(X;, : 
Zn) to be 


13 ZS Ba WO 0p) SI Wr Oo 


mix? mix? 


=I (pri, (W?™ On)) — 1 (prix, (We On)) + Ie (O™) 
<sup sup /(p, (W®™ Q)) — I(p, (W*™ Q)) + Ig (@). 
Q p 


Therefore, 
1 1 B,(n) E,(n) 1 (n) 
— log M, <— supsup I (p, (W°'™" Q)) — I(p, We’ Q)) + —Ie(@") 
n io » n 
(n) 1 1 
+ e[6"]- log M, + — log 2. (9.98) 
n n 
Since Ig(®@™) — 0 and e[®@™] — 0, the < part of (9.74) can be shown. 


Proof of (9.75) In what follows, we prove (9.75). If we can write W2 = «(W?) 
using a completely positive map «, then /(Q”, W8™) > 1(Q", W®), Defining 


def 
(Op). = >, p(m) QO”, we have 


I(p, (W?™ Q)) — I(p, (We Q)) 


-1'(com.w) — > pom) Hw? Q)") 
= (Xeon, WE) +>) pny (Wwe 0) 

= (Seom.ne => pomorncwe) 
=> pero” we) = n(Seom.ne) 
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+ Di pimor Awe) + >) pan)1(a", WE) 
<1(Qp, W°™) — 1(Op, We). (9.99) 


Using Exercise 9.12, we can obtain + sup, I(p, W®™) — I(p, W*™) < sup, 
I(p, W®) — I(p, W®), from which we obtain (9.75). | 


Exercises 
9.11 Show (9.81) using Exercise 8.29. 


9.12 Consider two c-q channels W? and W*" defined in Y and X’ and two TP- 
CP maps « and x’. Define two c-q channels WE = x(W3) and WE’ = «/(W8’), 
Consider an arbitrary probability distribution g in VY x %’, and let p and p’ be its 
marginal distributions in V and %’, respectively. Show that 


1(q, W® @ W®)-1(q,W* @w*) 
<I(p, W®) — 1(p, W*) + 1(p', W®’) — I(p', W*’). (9.100) 


9.13 Prove (9.78) referring to the discussions in Sects. 9.5.5 and 9.5.6. 


9.14 Replace the condition I;(@™) — 0 by another condition ie(e") — 0 in the 


definitions of C?-“(W) and C%:“(«). Show that the capacity is the same as the 
original one. 


9.15 Consider the secure communication via a quantum channel « when unlimited 
shared entanglement between the sender and the receiver is available. We denote 
the asymptotic bound of the secure communication rate by ce “(k). Show that 
CoE (a) = Ce (a) = max, I (p, ’). 


9.16 Prove (9.79) and (9.80) by expressing the environment of the composite map 
«’ o «in terms of the environment systems of the maps «’ and k. 


9.17 Show that the capacity is equal to the original one even though the condi- 
tion Iz(®“) — 0 in the definition is replaced by another condition ¢¢,4[®” = 


E . E : . . 
>; Dit ier eae 21) _, 0. Here, use the Fannes inequality (5.92). 


9.18 Show that the capacity is equal to the original one even though the above condi- 


tion is replaced with another condition ¢¢ ,[@] = sup; sup ; d ((W£ Q);, (W® Q);), 
which converges to 0. 


9.19 Show that /(Qp, W®) — 1(Qp, W*) — >, pi (1 (0', W®) — 1(0', W*)) = 
I(p, W? Q) — I(p, W® Q) for a quantum degraded channel (W?, W“). Also show 
(9.76). 
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9.6 Channel Capacity for Quantum-State Transmission 


9.6.1 Conventional Formulation 


Let us consider the problem of finding how a large quantum system can be trans- 
mitted with negligible error via a given noisy quantum channel « using encoding 
and decoding quantum operations. This problem is important for preventing noise 
from affecting a quantum state for a reliable quantum computation. Hence, this prob- 
lem is a crucial one for realizing quantum computers and is called quantum error 
correction. 

Itis a standard approach in this problem to algebraically construct particular codes 
[54-60] [61, Chapter 9]. However, the achievability of the optimal rate is shown only 
by employing the random coding method [43, 62, 63]. Although this method is not 
directly applicable in a practical sense, it is still nevertheless an important theoretical 
result. In the discussion below, we will not discuss the former algebraic approach 
and concentrate only on the theoretical bounds. 

Let us now formally state the problem of transmitting quantum systems accu- 
rately via a quantum channel « from an input quantum system 7/4 to an output 
quantum system 7g. When the quantum system 4H is to be sent, the encoding and 
decoding operations are given as TP-CP maps 7 and v from H to Hy, and from 
Hp to H, respectively. By combining these operations, it is possible to protect the 
quantum state from noise during transmission. We may therefore express our pro- 


tocol by ® = (H,7, v). The quality of our protocol may be measured by the size 


def .. ku 
|&| = dim H of the system to be sent. The accuracy of transmission is measured by 


def def 
e[®] = maxyew [1 — F?(u,voKor(u))| (H' = {u € H||lul] = 1). We often 
def at Sh 
focus on €2[®] = [1 —F r (Pmix, YO KO T)] as another criterion of accuracy. Let us 
now examine how a large communication rate 1 log |® | of our code & is possi- 
ble for a given channel «®” under the condition that ¢;[® ] or e2[@“ ] approaches 


zero asymptotically. Then, two kinds of quantum capacities C,,; and C,.2’ are 
defined as 


e as 
Cy,i (4) = sup {im — log |o™| 
{o™) n 


lim ¢{[@”]=0!, i=1,2. (9.101) 
noo 
Theorem 9.10 Two channel capacities Cy, and Cy are calculated as 


1 
Cqi(s) = Cy2(«) = lim — max I,(p, «®"). (9.102) 
n>o n peS(H§") 


7Since “quantum” states are to be sent, the capacities are called quantum capacities. The subscript 
q indicates that “quantum”’ states are to be sent. 
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Quantum Channel Quantum Channel 
Wire-tap 
—— 
“NY x 
“YY Tv 
: Receiver 
Sender Receiver Sender 
Wire-tapper 


Fig. 9.7 A quantum channel with an eavesdropper and a quantum channel with noise 


We now give a proof of the above theorem. Our strategy will be to relate 
this theorem to the problem of transmitting classical information in the presence 
of an eavesdropper, as examined in Sect. 9.5. In this approach, as shown in the 
proof of Theorem, protecting a quantum state from noise is equivalent in a sense 
to sending classical information without wiretapping when regarding the noise 
as the wiretapper (Fig.9.7). To this end, consider the Stinespring representation 
(Hc, po = |uo)(uo|, Ux) of &. We fix a state pa on Hy, and diagonalize p, as 
PA.fix = > vex Px|x) (Ux|. For the given basis 7, of 4, we focus on c-q channels 
W2 := k(x) (ay|) and WE := Ke (|x) (Z|) on Hg and He, respectively. Then, the 
following lemma holds. 


Lemma 9.5 Now, we regard as p as a distribution on X. Let M and L be arbitrary 
integers and v be the number of eigenvalues of Wy . There exists an isometric map 
V from C™ to Ha and a TP-CP map v from Hg to C™ such that 


2. 
F.(Pmix, V0 KO Ky) =1 -4(/2e+ 2e1) ; (9.103) 
where 
€ def min 2'+5 (ML)§ e784-s(2.W") 
se[0,1] 


def - = sot E 
= min max (4v2, v20) L-s/eatts(.W") 
sELV, 


To show Lemma 9.5, we prepare the following lemmas. 


Lemma 9.6 Let « be a TP-CP map from a system Ha toHpg := C™ and |®) be the 
maximally entangled state on He ® Hg, where Hg = C™. When a pure entangled 
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state |W) between Hc and TH, satisfies 

F(K(Y) (WI), [®)(@]) = 1-8, (9.104) 
there exists an isometric map V from Hc to Ha such that 

F.(pmix,c, #0 Ry) 1 — 40°. (9.105) 
Proof We choose the isometric map V from C™ to H,4 such that 
F (pmix,c, Tra |W)(W) = F(V|®)(D|V™, |W) (W]). 

Thus, we have 


F(K(IW)(Y), Ko Ky (|P)(Pl)) = FU) (YI, Kv IP) (PI)) 
=F (pmix,c, Tra |¥)(W|) = F (Trp |®) (PI, Trg 11%) (¥1)) 
2F([®)(P|, KY) (Y)). 


Applying the triangle inequality of Bures’ distance b(p, 7)? = 1 — F(p, 7), we have 


V1 = F.(Pmix.c, 6 0 Kv) = (|G) (P|, Ko Ky (|) (#])) 
Sb(\P)(P|, KU) ))) + OKUY)(Y), Ko Ky (IP) (PI) 


<2b(K(IW)(W), Ko Ky (|P)(H])) = 2/1 — FRY) (WI), |G)(GI) < 20. 


Hence, we obtain (9.105). a 


Lemma 9.7 Under the same assumption as Lemma 9.5, there exist an entangled 
state |W) between Hc := C” and Ha and aTP-CP map v from Hg to Hp := C” 
such that 


2 
Felpmincs¥ 0 wY)(WD) 21—(V2e0+V2e1) . (9.106) 


Proof Lemma 9.7 is shown by the combination of Theorem 8.17 and Lemma 9.4. 
First, for the given basis #7, of 714, we focus on c-q channels w := K(\4,) (uy |) and 
We = Ke (|x) (Uy|) on Hg and Hz, respectively. That is, « and «, can be written 
by the partial trace and the isometry U from 7,4 to Hg © 7g, respectively. 

Due to Lemma 9.4, there exist a map y from {1,..., M} x {1,..., L} to ¥ and 
a POVM Y = {Yon,1)}(m,p Satisfying 


LO(y, Y)] < 2€1, di(®(y, Y))  2e2. (9.107) 


Now, we apply Theorem 8.17 in the following way. 74, and 71,4, are defined as the 
Hilbert space spanned by {|m) 4,}*_, and {|/) 4, fis, respectively. The reference sys- 


tem 7/z is chosen to be 1g. Define the state |W’) := Tut Dim. |) asl) Az ltgon.y) A+ 
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Then, we define the state p := U,,|W’, uo) (W’, uo|U; on the system Hy, ® Ha, ® 
He ® He. Then, the state p satisfies 2€;-bit security for 714, and 2€2-bit recoverabil- 
ity for H4, ® Ha,. Let {v;} be a basis mutually unbiased to {|/) 4, Is of 7H(4,. There- 
fore, Theorem 8.17 guarantees the existence of a TP-CP map Kk; : S(Hg) > S(Ha,) 
dependently of / such that 


L 


1 
De LPL evil pa, arelon)) ® Lei) (vil, |PM(P|) 
l=1 


21 (J2a+ Via) - (9.108) 


Then, we choose an integer /’ such that 
2 
Fn (Livrlpa,azeler)), 1®)(®) = 1 (V2e+V2e) . —— @.109) 


Define the entangled state |W) between Hc := C™ and Ha. as 


L 


io 
|W) = 2 (onl) a. Fy y |m) a,|tioony)a = VL (vp |®’). (9.110) 


m=1 


Then, we have 
F(Ky (WY) (Y1), |P)(@) = F(LKr (uy |Pa, ara lor)), |f)(P I), (9.111) 


which implies (9.106). a 


Proof of Lemma 9.5 Lemma 9.5 can be shown by the direct combination of Lemmas 
9.6 and 9.7. | 


Proof of the direct part of Theorem 9.10 We are now ready to prove the direct part of 
Theorem 9.10, i.e., Cg,2(«K) => (RHS of (9.102)). Now, we fix the distribution p on 
X. Assume R < I(p, W®) — I(p, W®) and choose 6 := I(p, W®) — I(p, W"®) — 
R. We choose M and L to be e”®-5 and e"/(?:W)+5/2) Then, we have ML = 
en(l(p.W")—9/2) Then, we choose a sufficiently large n such that 


6 
s(It,,(p, W®) — 1(p, W®) — 5) <0, 


s(—N_s(p, W*) + I(p, W*) — °) <0 (9.112) 


with a suitable s € (0, 1). Now, we apply Lemma 9.5 with o = (wr )®”. Then, there 
exists a code satisfying (9.103) when the distribution is given as the independent 
and identical distribution of p. Due to (9.112), €; and €2 go to zero exponentially. 
Therefore, the rate /(p, W®) — I(p, W*) is achievable. That is, we obtain C,.9(«) > 
I(p, W®) — 1(p, W*). 
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Since I(p, W®) — I(p, W*) = I,(p, &) from (9.99), we may apply the same 
argument to x®” to obtain Cq,2(K) = (RHS of (9.102)), which completes the proof. 
The proof of Cy,2(«) = Cq,1(«) is left as Exercises 9.20 and 9.21. | 


Next, we give a proof of the converse part by a method similar to that in Theorem 8.10, 
i.e., we show that 


1 
Cq,2(K) < lim — max I,(p, Ke"), (9.113) 
n>on p 


In this method, for any class of local operations C, we focus on the function C() 
of a channel « from 71,4 to 7, that satisfies the following conditions. 


C1 (Normalization) C (v7) = logd for an identical map vg with a size d. 

C2 (Monotonicity) C(«’ 0&0 Ky) < C(«) holds for any TP-CP map x’ and any 
isometry U. 

C3 (Continuity) When any two channels «,,, and «2,, from a system H, to another 
system H,, satisfy max, 1 — F?((K1n ® er) (Ix) (x1), (K2,n @ UR) (x) (x1) > 0, 


IC(Ki.n)—C(K2.n)I : 
jog(dim ,dimtz) > 0, where the system 7/z is the reference system of H,. 
C(w®") 


C4 (Convergence) The quantity 


converges as n — oo. 
Based on only the above conditions, we can prove the following theorem. 
Lemma 9.8 When C satisfies all of the above conditions, we have 


@n 
Cy2(k) < C%(k) (# iy ‘). 


n->0o n 


(9.114) 


Since max, J. (p, «) satisfies Conditions C1, C2 (8.39), C3 (Exercise 9.22), and C4 
(Lemma A.1), we obtain (9.113). 


Proof According to Condition @ in Sect. 8.2, we can choose an encoding 7, and a 
decoding v,, with d,-dimensional space K,, such that 


Qn 


1 — F?(pmix, Yn 0K oT) > 0 


and limy-s oo wits = C,,2(K). Condition © in Sect. 8.2 guarantees that there exists 
isometry U,, such that F?(pmix, Un 0 K2" 0 T,) < Fe(Pmix, Yn 0 K®”" 0 Ky,). From 
Condition © in Sect. 8.2 there exists a subspace Kc! C K,, with the dimension te 
such that 

® 1- F? (pix Vy, 0 Ken ° Ku,) 


2 


max 1 — F?(x, yp) 0 kK” o Ku, (x)) < 
xeK), 


Therefore, from Condition @ in Sect. 8.2 we have 


2 
= max (1— F7(p, On > 0. 
3 peS(K') ( eae? *un)) 
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Letting K2,, be a noiseless channel, we have max, | — F?((v, 0 62" 0 Ky, ® tr) (|x) 
(x|), (K2,n ® tr) (|x) (x|)) = 0. Thus, Condition C3 implies 


IC 0 Ke" 0 Ky,) — C(La,)| 


> 0. 
n 
From Condition C1 we have 
C(% 0K" 0 Ky,) 
oe” — Cy2(k) 
n 
IC 0K" 0 Ky,) —C(lm)| | log d, — log2 
< 2 se ee gt 7) ON 
n n 
Hence, Condition C2 guarantees that 
C(K2" C(t, 0 Ke" 
fe ep Cy 2(k). (9.115) 
n—oo n noo n 
We obtain the desired inequality. a 


Further, Theorem 9.10 brings us the following corollary. 


Corollary 9.1 We have the following relations 


ae ony ie 2 @n 
lim — max J[.(p,«6°")= lim — max [.(p,K°" ok) 
noo N peS(H%") NOON peS(H8"),K 


1 
=lim— max  —Ayen(RIB), 9.116) 
n>o n peS(H2"@H™) 


where «is a TP-CP map on aie 


Proof Since the discussion in the direct part of Theorem 9.10 yields that Cy ,2(K) = 
1 MAX pe 5(H2"),« Le(P, «®" o &), we obtain the first equation in (9.116). Fora state p € 


S(H®" @ H), we choose a pure state |x) on H®” @ H” such that Tr |x)(x| = 
Tr, p. Hence, there exists a TP-CP map « on ci such that p = K(|x)(x|). So, we 
have 


= :@"(p) (R|B) => — FAiy.2" 045(\x) (x}) (RIB) => I.(Trr |x) (x|, Ken ° K), 


which implies the second equation in (9.116) a 


Moreover, using Exercise 9.12, we can simplify the RHS of (9.102) in the following 
special case. 
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Lemma 9.9 When there exists another channel x! from the output system Hp of 
channel k, to its environment system TH such that 


K'(K(p)) = Ke(p) for Vp € S(Ha), (9.117) 
then 
. 1 
max J.(p,«) = lim — max I,(p, K®"). (9.118) 
peS(Ha) n>o n peS(H8") 


Further, when (9.117) holds, (9.76) implies the concavity 


> pile(pi 6) < (Xo). (9.119) 


For example, the following case satisfies the above condition. Suppose that there 
exist a basis {uw;,..., ug} of the input system 7/4 of channel « and the POVM 
M= {Mi}, on the output system 71g such that 


Tr K(\u;) (ui|)Mj = 6;,;- (9.120) 


As checked as follows, this channel satisfies the condition (9.117). For example, the 
phase-damping channel «5 satisfies condition (9.120). 

Now, consider the Naimark—Ozawa extension (Ho, po, U) with the ancilla Hp 
given in Theorem 7.1. Note that we measure the system 7H with the measurement 
basis {v),..., Ug}. We also use the Stinespring representation (Hc, pp, U;;) of K. 
Since for any input pure state p, the state (U @ Iz)((U;,(p ® p9)U;Z) ® po)(U ® Iz)* 
is pure, there exists a basis {vu}, ..., uv} on the space Hp @ He, where He = H4 ® 


Hc. This is ensured by (9.120). Hence, the unitary U’ = sy |v‘) (v;| satisfies 


Tr7.(U ® Iz)((Ux(p ® po)Uz) ® po)(U ® Iz)" 
=U'Trp,e(U ® Iz)(Ux(p ® py)UZ) ® po(U ® Iz)*U". 


Therefore, 


Ke(p) =Trp Trx,(U ® Iz)((U;(p ® py)U;z) ® po)(U ® Iz)* 
=Trp U'Trp.e(U ® Iz)((U,.(p ® py)U;Z) ® po)(U @ Iz)*U", 


which implies condition (9.117). 
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9.6.2 Proof of Hashing Inequality (8.121) 


Finally, we prove the Hashing inequality (8.121) based on a remarkable relation 
between the transmission of the quantum state and the distillation of a partially 
entangled state. 

Given a channel «&, we can define the partially entangled state K ® vu ar((ug. a 
On |), where 7/4) is the reference system of the input system 7/4 and Tee isa 
maximally entangled state between H4 and H 4’. Conversely, given a partially entan- 
gled state p on the composite system H, ® 7g, we can define channel «, with the 
same dimensional input system 7/4 as the system 7/4 (with the dimension d) via 
quantum teleportation as follows [64]. Perform the generalized Bell measurement 
{lujs"’) (u val on the composite as Ha ® Hy and transmit the outcome (i, /), 


where i = (4 @ Xi LZ ia, 9 - In this channel, the output system is given by 


He ® ce . Now, we consider the purification of p with the reference system 7/r. The 
environment ky, ¢ of the channel &, is He ® Cc? LN ow, we consider the wiretap chan- 
nels in the way as Lemma 9.7. Then, we choose the distribution p to be the uniform 
distribution on a CONS of Hy. Then, I(p, ky) — I(p, Kp,e) = Ic(pmix,a, Kp) = 
A (pg) — H(pr) = —H,(A|B). Applying Lemma 9.7 to the channel oe similar to 
the proof of the direct part of Theorem 9.10, we find that E7”,(p) = =H p(A|B). 


9.6.3 Decoder with Assistance by Local Operations 


Next, we consider the quantum capacity when the decoder is allowed to employ 
several classes of local operations, etc, two-way LOCC <., separable operation S, 
and PPT operation PPT. 

In this case, we need to describe sender’s system 71,4 after the encoding. When 
the quantum system 4 is to be sent, the encoding operation are given as TP-CP maps 
T from H to H4 ® Ha. Then, a decoder v., assisted by two-way LOCC operations 
is given as a two-way LOCC operation from 7, ® Hy’ to H @ Ha”, where 7H," is 
the one-dimensional system. Decoders vs and vppy assisted by separable operation 
and PPT operation are given as a separable operation and a PPT operation from 
Hp ® Hy to H ® Hay, respectively. For one-way LOCC assistance, there are two 
cases. One is the one-way from the sender to the receiver. The other is the one-way 
from the receiver to the sender. However, the latter is meaningless for decoding 
because to improve the quantum capacity, transmitted information needs to be used 
by the receiver. So, we consider only the former case, and denote it by >. 

Then, for C =—, <, S, PPT, we express our ee by @®c = (H,T, vc). The 


quality of our protocol may be measured by the size |®c |= dim H of the system to be 
sent. The accuracy of transmission is measured by €;[®c] = makueh [1 — F-(u, vc 
ok 0 T(u))](H! = = {u € H|||u|| = 1}). We alsoemploy eo[@c] © = =an - F? (Pmix, YC 
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0k 0T)] as another criterion of accuracy. Similar to (9.101), the capacities are 
defined as 


e . 1 n 
Cy.ci(e) = sup {sim = log |@2| 
{0} n 


lim ¢[®¢?] = o| (9.121) 
n—->>oo 


fori = 1,2andC =—, <, S, PPT, where the supremum is taken among codes on 
with decoding assistance by the class C. Similar to C!, we also define the strong 
converse quantum capacity by using the second criterion eb? Jas 


+ def . 1 (n) 
Cyc(k) = sup j lim — log|®c'| 
{0} n 


lim e,[@¢?] < 7 (9.122) 


for C = , >, <, S, PPT, where C = % means the non-assistance case. Then, we 
have the following theorem. 


Theorem 9.11 The capacity Cq,c,i(«) is characterized as 


1 
Cy,c(k) = Cy,c2(k) = lim — max Ea ® tr(p)) (9.123) 
N00 N peS(H&"@Hr) 


forC =>, <, S, PPT. 
Although we explicitly describe uz in (9.123), we will omit it latter. 


Proof We can show that Cy,c.1(K) = Cy,c,2(«) by the same way as Cy,1)(K) = 
Cq,2(K). Hence, we discuss only Cy 9(k). 
Direct part: First, we show the relation 


C > EC ; 
g.C.2(K) = ssn aie m (K(p)) 


Consider the following protocol. The sender prepares the state p € S(H4 ® Ha). 
Next, the sender sends the part 7{4 of p via the channel «. After the receiver 
receives the state in 7/g, the sender and the receiver apply distillation proto- 
col to achieve the rate Ee (K ® Ur(p)). Finally, the sender and the receiver per- 
form the teleportation protocol. This method achieves the rate ES(«(p)). Apply- 
ing the same method to the state p ¢ S(H%" ® Hr), we can achieve the rate 
liMy +00 5 MAX pesH"@Hg) Em (K"(p)). 

Converse part: Consider a sequence of codes oi y= (H®", T,Vc.n) such that 
limy-+ 00 €o[ 60] = 0. Let |®,)(®,| be the maximally entangled state on H®” @ 
H2". We define the bipartite state py := Tn(|®n)(,|) on HE” ® (Harn ® HE"), 
where 7, is a TP-CP map from H®"” to ‘are ® Han. Then, we have 


(Pn \Ve.n(K2" (pn))|Pn) = 1 — €2[GL?] > 1. 
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So, due to (5.103), we have 


1 1 
lim — max EC (x @ir(p)) = lim ——H,,,(n2"(p,))(RA'|B) 
non peS(H2"@Hr) noo n : 
: 1 : @n : 1 (n) 
= lim —logdimH™" = lim — log|®c’|. (9.124) 
non n>o n 
a 


As shown latter, the relation 


: 1 @n\ _ 1; 1 > @n 
lim — max [.(p,«°") = lim — max E(k” @tr(p)) (9.125) 
no N peS(H%") NOON peS(H{"@Hp) 


holds, where — is the one-way LOCC from R to B. So, combining (9.102) and 
(9.123), we have 


Cq,2(K) = Cg,,2(k), (9.126) 


which implies that the post one-way communication does not improve the capacity. 
Proof of (9.125) Since the inequality < in (9.125) is trivial, we show only the part 
>. Choose a state p € SCH?” ® Hp) and an instrument & = {K;}; on the system 
Hr such that Ey (K®" ® tr(p)) = —Hy, Lp@ki (K2"@re(p)))@li,i) (i,i| R|B). Define the 
probability p; := Tr «;(p). Then, 


— AS, 1p Qn; (x® @re(p))@li,i)(i,i| (RIB) 


= — Ay, 2G 1p (.4@ri(p))81i,i)(i,i| (RIB) 
= De Pi Aone rg(Liganiipy) RIB) 


L 


< max — H,,2n(p)(R|B) = max I.(p, Kk"). 
peS(He"@H}’) peS(He"@HY’) 
Hence, (9.116) of Corollary 9.1 yields (9.125). | 


Next, we address the strong converse quantum capacity cc c(«). For this purpose, 
we define the SDP bound: 


Cspp(K) = max — Espp(K(p)) 
peS(Ha@Hr) 
(a) 
= max E K = max E K(|x)(x|)), 9.127 
igs ead spp(K(|y) (y1)) jae spp(K(|x) (x|)) ( ) 


where |x) is the purification of p. The equation (a) holds the convexity (8.232). We 
also have another expression for the SDP bound 
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Cspp(k) = max 2. POPU») I29. (9.128) 


Pp o'20: iro <l 


where p is a distribution taking values in the set of pure states on H,4 ® Hr. Here, 
D(p|l|c) is defined as Tr p(log p — logo) even though Tr o ¥ 1. (See Exercise 5.25.) 
Then, we can apply the minimax theorem (Theorem A.9), which implies that 


C. = min max D f 
spp(k) = min ma 270) (x(ly)(yDI1o") 
= min max D(K(ly)(yD Ilo’) 
o'€S(Ha@He) ly) (vleS(Ha@Ha) 
© min max D(«(p)||o’), (9.129) 


a ESCH A@HR) peS(Ha@Hp) 


where (a) follows from the convexity of the function p > D(K(p)||o’). 

Now, we introduce another expression for the SDP bound. Choosing a state p on 
the input system, we consider its purification |y)(y|. Then, we have the concavity 
of the quantity Ispp(p, &) := Espp(K(|®)(®|)) with respect to » (Lemma 9.11). 
This property can be also shown by the monotonicity for local TP-CP maps because 
Ispp(p, &) is independent of the choice of the purification’ ’”. 


Lemma 9.10 The SDP bound Cspp(k) satisfies the subadditivity. 
Cspp(K1 ® K2) < Cspp(K1) + Cspp(k2). (9.130) 


Proof Choose the pure state |x‘) (x'| and the positive semidefinite matrix a; > Oas 
II74(o/)|l1 = 1 and 


Cspp (Ki) = D(Ki (|x") (x"|) 01). (9.131) 


Then, for a pure state |x)(x| € S(H4, ®@ Ha, @ Hr, ® He,), using (5.86) in the 
following step (a), we have 


D(K ® K(x) (x[)[lo, ® 04) 
= — Trky ® ko(|x)(x|)(log(a1) ® bh + Nh @ log(oy)) — A(k1 ® ka(|x) (x)) 


S — Try (Try 1 @ wa(|x) (x1) log(o’) — Tra(Tri #1 ® Ka([x) (x1) log(o}) 
— A(Tr ki @ Ka(|x)(x|)) — A (Tr2 k1 @ Ka(|x) (x])) 
=D(K1(Tr2 |x)(x|)l]o4) + D(K2 (Try |x) (x1) |]o4) 


(b) 
<Cspp(K1) + Cspp(k2), 
where (b) follows from (9.129). a 


We have the following theorem. 
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Theorem 9.12 ([65]) The strong converse quantum capacity Ci c(K) is character- 
ized as follows. 


Cj.clk) < Cspp(k). (9.132) 


This theorem means that the SDP bound Cspp(«) upper bounds the strong converse 
quantum capacity Che («). Since the SDP bound Cspp(«) has single-letterized forms 


(9.127), (9.128), and (9.129), this formula is helpful for evaluating Ch e(k). 
To show Theorem 9.12, we prepare the following quantities. 


Iq\ spp (p, &) = Ea(K(ly)(y|)) (9.133) 


Ce K) i= max Ig KA) = max E,(K(W)(W])), (9.134 
aispP(F) = max Jaispr(.w) =, max Ealw(W (WI), (9-134) 


Ia spp(p, K) = Ea(K(ly)(yl)), (9.135) 


Co\spp(K) == max Iq) spp(p, K) = E.(K(\¥)(W|)), (9.136) 
peS (Ha) ) 


max 
IM)(WleS(Ha@Hr 
where |y) is a purification of p € S(Ha). 


Lemma 9.11 For states p; on Ha and a distribution p;, we have 


1 
T\+5| spP (= Xi Pir ‘) 2 


+58 s : 

l ri Ta Ruts (01,4) e[-1,1 O}, 
. oe(> e fors ée[ ] \ {0} 
(9.137) 


+8 : 3 ; 1 
l ri Tes Ri4s(9i-6) =. 0}. 
; oo e for s €[—5, 00) \ {0} 
(9.138) 


a 1 
T+5| spP (= Xi pis ‘) = 
i 
As the limit s + 0, we have the concavity 


Ispp (x Xi Pi “) = I spp (x Xi Pis ‘) = Ai spp (pi, K). (9.139) 


Proof We show (9.137) with s € (0, 1]. We choose the purification |x;) of p; on 
Ha ® Hr and the positive semidefinite matrix o/ such that IA (oh = | and 
Ey 45(6(1x;) (x11) = Digs (K(x) iD I10;). 

Now, we consider the bipartite system H4 ® (Hr ® C4) and focus on the state 
ly) = yy /Xi|x;, 1) and an arbitrary positive semidefinite matrix o’ satisfying that 
I74(o’) ||) = 1. Using P; = 7 @ |i)(i|, we define p; := ||74( Pio’ P;)||1 and of := 
5 il Pio’ Pi |i). Applying (a) of Exercise 5.25 to the TP-CP map o/ t> >); Pjo’ P;, 
we have 
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eofitsispeCi Xi pis) > eoPits (aly) (yDIlo) 


> oP Pres (Es PIMOIPIIE, Pa’ P) = 7 AHS pos psPrselwe eno), 


L 


By taking the maximum for o¢, the reverse Hélder inequality (A.27) yields that 


sTi+s|spp (>); Aipisk) 14s —s_sliisispp(pi.k) 
efits) ; Xi Pi > rj p;*e s| i 
i 


l+s 
>(D sett , 
i 


which implies (9.137) with s € [0, 1]. Replacing the role of the reverse Holder 
inequality (A.27) by the Hélder inequality (A.25), we obtain (9.137) withs € [—1, O]. 
Similarly, we can show (9.138). | 


Lemma 9.12 [65] We have 


@n ad, 
Tuispp (kK) <nT) spp (kK) + i logn fora € [1, 2] (9.140) 
~ @n ~ ad, 
Tuispp(K>") <n spp (kK) + | logn fora € [1, &), (9.141) 


where dy, is the dimension of Ha. 


Proof For an input state p on H®, we define p := >, U,pU;, where 7 is a 
permutation among n letters and U, is defined in (2.213). Since Ig) spp(p, Ke") = 
Iq; spp(U,pU;, k®"), Lemma 9.11 implies that 


Taspp(p, 6°") < Iai spp(p, K@"). (9.142) 


We choose the purification |y) of p. (2.215) implies that 


ly(yl < a+ pier! Fwy) uay), (9.143) 


where yp is the Haar measure on 7/4 ® He and 7p is the reference system of 71/4. 
For a positive semidefinite matrix o’ > 0, we have 


Dale" (W/(W'Dilo") 
_otldal? = 1) 


aq eee 1) + Da(®" (fwrwierwaeie’). (9.144) 


Taking the minimum for o’ with the condition ||T4(0’)||; = 1, we have 
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E(k?" ((W')(W"\)) 
oda!’ — 1) 
~ -—1 
(@ a(\dal? — 1) 

-—1 
_otldal? = 1) 
~ a-1l 


log(n + 1) + Eq(x" (/ wey era) 


login +1) + max Eq(K®" (|W) (W|®")) 
login + 1) + nmax Ean ()(W), (9.145) 


where (a) follows from Exercise 8.72. Taking the maximum for (W|, the second 
equation of (9.134) implies (9.140). We can show (9.141) in the same way. | 


Proof of Theorem 9.12 Consider a sequence of codes op? = (H®", T, Vc.n) Such that 
dim H®”" = e””’. Let |®,)(®,| be the maximally entangled state on H®” @ He. We 
define the bipartite state py := Tn(|Pn)(Pp|) on HS” @ (Harn ® HR"), where Ty isa 
TP-CP map from H®" to He" ® Han. Then, since |@,,) is the maximally entangled 
state with the size e””, the relation (8.364) withn = 1,r = nr’, p®” = K®"(p,), and 
K,, = Vc.» implies that 

Ey 45) 9pp(eO" (on) +snr! 

I+s 


(PD, Von(K>" (pn))|Pn) < e 


=C14s| SDP (K2")-+8nr! 
TFs 


(+s)d4 —C1 45) spp («)+sr/ 
n 
s e 


<e <n I+s 


for s € [0, 1], where (a) follows from (9.140) of Lemma 9.12. Since 


fi / 
fag SOE OT 6, (9.146) 
see[0, 1] l+s 


when r’ > Cspp(k), we obtain 
C} clk) < Ci4sispp(x) for s € [0, 1], (9.147) 


which implies (9.132). 
Similarly, we can show 


(+s)d, 
= A i 


—C145| spp (k)+sr! 
t THs 


(Dy |VCn Ke" (pn))|Pn) <n 
for s € [0, co). So, we can show 
Ch clk) < Ci4s)spp(k) for s € [0, 00), (9.148) 


which gives another proof of (9.132). a 


Exercises 


9.20 Show that Cy 9(«) < Cy,1(«) using (8.25). 
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9.21 Show that Cy.1(«) > Cq,2(«) using (8.26). 


9.22 Show that max, /.(p, &) satisfies Condition C3 similarly to Exercise 8.44. 
Here, use Fannes inequality (Theorem 5.12) for two states (K1,, ® tr)(|x) (x|) and 
(K2,n @ tr) (|x) (x]). 


9.23 Give an alternative proof of Cg.2(K) < limp—+oo 4 MAX pe(742") I.(p, K2") by 
following the steps below [66]. 

(a) Let & be a quantum channel with the input system 7(4, and @ be a code for the 
channel «. Show the existence of a code &’ such that |®’| = min(dim H,, |®|) and 
E2[®’] < 2€2[®], by using property © of Sect. 8.2. 

(b) Let 6 = (Ha, Ky, v) be acode with an isometric encoder «Ky for achannel «. Let 
Pmix be a completely mixed state in 714. Define 6 7 J20 — F.(pmix, YOK 0 Ky)). Show 
that 


max T.(p, kK) = LO prin, kK) = T.(Pmix: VOKO Ku) 
peS(Ha) 


> log |®| — 26 (log |®| — log 6), 


by using (8.39), (8.42), and (8.38). 
(c) Given that a sequence of codes 6 = (H®", 7, v™) satisfying e.[6] > 0, 
show that 


1 1 
lim — max [,(p,«®") > lim sup — log min{|®” |, d’j}. (9.149) 
NO N peS(H®") n>o NN 


(d) Complete the alternate proof of Cy,2(«) < limy_o6 i MAX <5 (H2") I.(p, 2"). 


9.24 Show the concavity of the map p +> Ispp(p, &) (Hint: use the monotonicity of 
Espp for local TP-CP maps.) 


9.7 Examples 


In this section, we calculate the capacities C.(k), C.(k), Ce(k), Cie (x), and Cy 1 (kK) 
in several cases. 


9.7.1 Group Covariance Formulas 


For this purpose, we derive forlumas for C,(«), C£(4), Cie (x), and Cy.1(«) with 
the group covariance. Let & be a TP-CP map from system Hy, to 7g. Assume 
that there exists an irreducible (projective) representation U4 of a group G on the 
space 71, satisfying the following. There exist unitary matrices (not necessarily a 
representation) {Ug(g)}zeg on Hg such that 
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K(Ua(g)pUA(8)") = Un(g)K(p)UB(g)* (9.150) 


for any density p and element g € G. In what follows, we derive useful formulas in 
the above assumption. Then, 


I(p, 9) = T (Puce); kK), 


where py (p) # p(U pU*). Hence, 


1p.) = f Hpo.@.v(ds) = 1(f posen(ds)-') 
G G 
=H(K(Pmix)) — | >! px H (KU a(g)* pxUa(g)))v(dg) 
G x 
<H(K(Pmix)) — noe A(K(p)). 
This upper bound is attained by the distribution (v(dg), Ua(g)* PminUa(g)). Thus, 


Co(k) = A(R (Pmix)) — min H(K(p)). (9.151) 


Next, we define the representation Be. of the group G” 2 GxGx---x Gon 
ee 


n 
the n-fold tensor product system H&” as 


US? (81, «+ Bn) = Ualgi) @ ++ @ Valen). 
Then, the set of unitary matrices 

Ug? (81, «++ 82) = Un(gi) @ +: @ Us(gn) 
satisfies 


BOL Cid BP Gig ee) 
SUP” (Sips -5 Ba) COU (Bisa nos Ba) 


for any density p on the n-fold tensor product system ane If U, is irreducible, then 
uy is also irreducible. Hence, we have the formula 


Co(K°") = nH (K(Pmix)) — ae H(K°"(p)). 


Thus, 
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min, H(K®"(p)) 


Cr (K) = H(K(pmix)) lim 
noo n 


(9.152) 


Further, relation (9.150) yields that 


(kK @ tr)((Ua(g) @ Ir) |u) (u|(U4(g)* ® Ir)) 
=(Uz(g) ® Ip)(K @ tr)(\u) (ul)(Us(g)* ® Ir)). 


Hence, we have 
I(p, %) = I(Ua(g)pUa(g)", ),  Te(p, ) = I-(Ua(g)pUa(g)", ). (9.153) 


Concavity (8.45) of the transmission information guarantees that 


I(p, 6) = i} 1(Ua(g)pUa(g)", Kv (dg) 
<1( [ Us(g)pUa(g)*v(dg). k) = L(Pnixs 8) (9.154) 
which implies 
Cé (kK) = I (pmix, K) = log da + logdg — H(K @ u(|®a) (zl), 


where |®,)(®,| is the maximally entangled state. 

Next, we consider the quantum capacity C},,(«) when the wiretap channel («, «”) 
is a degraded channel. In this case, concavity (9.119) holds. Hence, using the second 
relation in (9.153), we have 


Ciq(k) = I-(p, &) = Ic(Pmix, K) = logdg — H((k ® tr) (|Pa)(Pal)). 
(9.155) 


Since the SDP bound I/spp(p, &) is concave for p®*’** (Lemma 9.11), the SDP bound 
Cspp (Ky ) is calculated to Ispp(Pmix, &)- 


9.7.2 d-Dimensional Depolarizing Channel 


When « is the d-dimensional depolarizing channel «a,), the natural representation 
of SU(d) satisfies the above condition. Hence, using (9.151), we have 


Ce(Ka,A) = A (pmix) — H (Aju) (ul + (i — AX) Pmix) 


=F Oe toga + y+ AE” log(1 — 2). 
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Indeed, we can easily check that this bound is attained by commutative input states. 
Thus, 


Aga Ay . =A) log(\d + (1 — d)) + ca ae =) log(1 — A). 


Ce(Kd,,) = 
For entangled input states, King [11] showed that 
a A («q'\(p)) = nH (Alu) (ul + (1 — d)pmix)- (9.156) 
Thus, formula (9.152) yields [11] 
Cé(Kd,.) = Ce(Ka,))- 
Further, from (8.332) and (8.330), 


Ce e(Ka,) = I (Pmix, Kd.) = 2logd — H(p, 1-xa2-1) 
ree 


1— X\(d? - 1) 1-A\@-1). (-A@-1) 1-A 
7 log 7 + Pp log p 


=2logd + 


9.7.3. Transpose Depolarizing Channel 


In the transpose depolarizing channel oA ,, the natural representation of SU (d) satis- 
fies the above condition. However, Ug is not a representation. As with a depolarizing 
channel, using (9.151), we have 


C.(63 5) = ColKa,y) 


=O pogcnd 41 = 20) 4 = ue =) log(1 — 2). 


Further, relation (9.156) yields 
a H((Kq,y)?"(p)) = at H ((Ka,y)®"(p")) = oe H ((Ka,,)°"(p)) 
=nH (A\u)(u| + Cl — A)pmix) 
for \ > 0. Hence, 
Cé(Kg,) = ColKg,y) 


for \ > 0. Matsumoto and Yura [18] proved this relation for \ = — x. 


Further, from (8.333) and (8.324), its entanglement-assisted capacity is 
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Ce ()) = I (Pmix, fey) => 2logd = A (py. d-spn-D ) 
GQ—-@+)Aj@—-1), 1-@+)Aa 
log 
2d a 
in G+@-DA@+),, 1+(d—-1)A 
2d nn 


=2logd + 


9.7.4 Generalized Pauli Channel 


GP 


In the generalized Pauli channel x: a4 


the representation of the group (i, j) € Zq x 


Lig be xi Z! satisfies condition (9.150). Its entanglement-assisted capacity can be 
calculated as 


Ce (KS) = T(pmixs Ky) = 2logd — H(p). 
When the dimension d is equal to 2, using (5.34), we can check 


Coliip?) = Colip") = log ~ min (pi + pj). 
ifj 


In this case, as mentioned in (d) in Sect. 9.2, King [10] showed that 
Ce, = Cn = Co): 


When the distribution p = (pj, ;) satisfies p;,; = 0 for j A 0 in the d-dimensional 
system, we have 


CO =O y=, ) Slope. 


In this case, the channel a is a phase-damping channel. As proved in Sect. 9.7.7, 
it satisfies condition (9.117). Hence, (9.155) yields 


Cy (hig?) = Ic(Pmix, Ky) = logd — H(p). 


9.7.5  PNS Channel 


The PNS channel satisfies condition (9.150). Hence, using (9.151), we have 


ns ns : ns m+ d—1 
CoC sm) = Hsin sm (Pmix)) — min H (sip sm(P)) = log ( ) 


d—-1 


pns 
) — 


Since Cé(K)'" ,_,) is less than the dimension of the input system, CoKa nom) = 


d,n—>m 


C.(Kt"_,_). Its entanglement-assisted capacity is calculated as 


dn>m 
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ee (ae = I (Pmix; Be een) 
\ m+d-—1 Ad n+d-1 i (n—m)+d-1 
=10 0) —10 é 
2 ar on rr oe z d= 


From Exercise 5.16, the wiretap channel (Ki7),.m+ (Ky n>m) ) is a degraded chan- 


nel. Hence, from (9.155), its quantum capacity is calculated as 
Cos oy sin) — Ic (Pmix; ee 


(n—m)+d-1 m+d-—1 
=—] it : 157 
oe ( d-1 )+ oe ( d-1 ) (9.157) 


9.7.6 Erasure Channel 


The erasure channel also satisfies condition (9.150). Hence, using (9.151), we have 


ColB) = Le(Prie Bip) = HCG (Puix)) — min H (x (0) 


L=p 
=== pylog —— Plog p tp) = = py lega, 
Since it is attained by commutative input states, C, c(Ky py) = Cine) 
Next, we consider the capacity C¢(K7",). Because 
KPO) = QP d= pp * Try, 0 ® lea) (ual®™, 


{ised} C{L..s} 


the minimum entropy min, H (ape (p)) is calculated as 


min H(i)?" (0) 


=min DP = pip“ log( = pyip + HM, 2D) 


iii sisees igh Ctl asses n} 


=nh(p)+min SS) (1 pp” “A(T, P) = 2h). 
p 


{ij sesstet Ul, «.h} 


Hence, from (9.152), 


Ci (Kg) = Cole) = C.(ng,) = (1 — p)logd. 


The entanglement-assisted capacity is calculated as 
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CE (Kap) = 1 (Pmixs K7'p) 


d d 
=logd+ (1 — p) log f=— ee a ky (1 — p)log(l — p) (9.158) 


=2(1 — p)logd, 


where we used Exercise 5.15 in (9.158). 
From Exercise 5.15, the wiretap channel (Kj ot Mae )£) is a degraded channel. 
Hence, from (9.155), its quantum capacity is caleulated as [67] 


era 


Cai (K4,) = Te(Pmixs KF,) = (1 — 2p) logd for p < 1/2. 


9.7.7 Phase-Damping Channel 


Any phase-damping channel «5? clearly satisfies 
Cnt D) = C. (Ki Dy = Cen D) — logd. 


Indeed, we can show that the wiretap channel Ge oS D)E) j a a degraded channel as 
follows. When the input state is the maximally entangled state 5 :S, x1 lek & RV le), e; Ri 
the purification of the output state 7, , dkilex, ef) (e, ef | is given as 


1 


pace R OE R OE 
7 . VK IU Cs Ces Ce) (Els es Cr |s 
A 


kk LU 
where Y = (yx.4/) satisfies Y*Y = D. From the condition X;., = 1, the positive 


semidefinite matrix pf = Ser Yewver ler) (ef | satisfies the condition of states 
Tr pe = 1. Then, by applying (8.32), the channel («5?)* to the environment is 
described as 


(Kp) (p) = TrraUa.e @p") >) yew Terler ek, et (er, ef, ef | 
kk! LU 


=>) raw Verlee) (ef | = Dd lenlny (pyle) Yew er lee) er | 
k 


k 


= leap (Plex) og 
k 


Hence, the wiretap pannel (ae ate D)©) is a degraded channel. Further, the phase- 
damping channel «5? satisfies the invariance 
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def y 
I.(UspUj, Ky) = Ielp, Ky), Us = Die lex) (ex. 
k 


Hence, using concavity (9.119), we have 


Ca. = max I.(p, oe. = max I, (= PKiek) (exl, <P) 
p Pp : 


=max H(p) — u( rst) > logd —H (5°) 


k 


Further, since 
Cspp(nip) = max I. (= Pelex) (exl, 2) (9.159) 
k 


is shown below, Theorem 9.12 implies that 
Cr iten Y= Cl aa, 


That is, the channel «5) satisfies the strong converse property for quantum state 
transmission [65]. 

(9.159) can be shown as follows. Notice that the channel ee is the covariant with 
respect to the unitary ba ef % lex) (ex|. Since the SDP bound Jspp(p, oe ) is concave 
for p™°** (Lemma 9.11), the maximum C. spp(K'p) = max, Ispp(p, kK) is realized 
when pis invariant with respect to the unitary >* re % lex) (ex|. Cspp (ae ) is calculated 


to max, Ispp(>, Pelex)(exl, Kp). Since the state pp := Ky (>, ) VPk/Dileks 
ef) (e,, a |) is maximally correlated, (8.143) and (8.227) imply that 


I, (x Pxlex)(exl, ?) = a(t (x riedtal)) — H(pp) = E,,s(Pp) 


k k 


=Espp(pp) = Ispp (= Pxlek) (ex|, “®). 


k 


So, we obtain (9.159). 


9.8 Proof of Theorem 9.3 


First we prove Lemma 9.1. 
Proof of Lemma 9.1 


S=>A: From (9.30), S implies (9.34), ie., A. 
S=>L: From (9.31), 


9.8 Proof of Theorem 9.3 


: 1,27 1,2 
cane alti ag S20) 
p'?:Tr pl:2(X!4X2)<K 

i 1,1 27,2 

= min fe yt+ fr) 
p'?:Tr pl:2(X!4X2)<K 
: 17,1 27 2 
= omn  flh + P) 
p',p?:Tr p'X!'4Tr p?X?2<K 
: ‘ em 
= mn min + 
O<A<1 p!:Tr p!X!<AK f (e ) p 


f?(p’). 


mi 
2: Tr p2X2<(1-A)K 
On the other hand, since f!(p!) + f?(p?) = f!?(p! @ p’), we have 
on, main fp) + f°") 
p',p2:Tr p' X!4Tr p?X2<K 
> min Fp! @ p?) 
p!,p?:Tr p'X!4Tr p?X2<K 


> min Ce 
pl: Tr p!:2(X!4X2)<K 


Hence, we obtain (9.33). 


549 


LSC: Choose pj” such that Tr py?(X! + X2) — f)?(p)7) = max,i2 Tr p)?(X! 


d 


+ X?)— fA Gp), Then, the real number K = Tr age. + X7) satisfies 


max Tr p(x! zig x?) _ f?(p') 


p 
= max Trp’? (x! + X) -_ f ?(p') 
p'?:Tr pl2(X!4X2)>K 
=K+ max | — (0!) 
pl: Tr pl2(X!4X2)>K 
=K + max —fl(p') — f7(p") 


p!,p?:Tr p'X!4-Tr p?X?>K 


= max Tek’ =f Oy tik fey 


ph p2sTr p!X!4-Tr p? X2>K 
<maxTr p'X! — f'(p!) + Tr p’X? — f°(p"). 
ppm 


Conversely, from (9.30), 
max Trp’? (x! 4 xX) — f ?(p') 
pl 
> max Tr p(X! + Ds = Fp @ p) 
pp? 


> max Tr p'X! — f'(p!) + Tr p’X? — f*(p’). 
psp 


Hence, we obtain (9.32). 


C=S: For any pe, from Lemma A.8, we choose Hermitian matrices X! and X? 


such that Tr p},X! — f!(p5) = max, Tr p'X' — f'(p'). Hence, 
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2 
> Tr 9X! — f'(0) = > ma Tr p'X' — fi(p') 
i i=1 
= max Tr p!?(X! + X7) — fl?(ph?) > Tr py?(X! + X7) — f(g"). 
zi 


Since Tr oe +X?) =Tr pox + Tr px we have (9.31). a 


Proof of HM=> HC First, we assume that there exists a channel «x, for any channel 
K, any positive semidefinite Hermitian matrix X on the input system 7/4, and any 
probability p, such that 


Ce(KX,p) = max(I — P\X«n(p)) + p Tr Hp, (9.160) 
and 
Ce(K yt, ® Ky») 
=max((1 — py Xnten2(p) + = p)p(Xnt(p!) + Tr X*p") 
+ (1 — p)p(v2(p?) + Tr X'p!) + p?(Tr X'p' + Tr x7p?)). (9.161) 


The channel «y_, is called Shor extension [14] of «. Apply Condition HM to the 
channel «| ,, »® BG ys p> then we have 


2 12 1 1 22. 
max { (1 — p)'Xntan2(e) + — p)p(Xni(o) + Tr ra p 
ph 
_ 2 1 11 D. Es 11 il 29 
+il=pipixyee y+ ir ae ppt pi xp sp) 
P P Pp 
Dy 1 
<max(1 — p) (x0 (oy+Te— 0!) + max(1 — p) (vow? ) + Tr —X? r). 
p Pp 
Taking the limit p — 0, we obtain 
MAX Xj1g42(9"7) + Tr(X! + X?)p!? 
pla 
<max(x,1(p') + Tr Xp!) + max(x,2(p?) + Tr X*p*), 
p p 


which implies Condition HC. 

Next, we define the channel kx, with the input system H,4 @ C*, where k > |X |l, 
and check (9.161). First, we generate one-bit random number X with probability 
Po = 1-— p and P; = p. When X = 0, the output state is h(Trc« p) for the input 
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state p. Otherwise, we perform the measurement of the spectral decomposition of X, 
and send the receiver the state &” (Tr, p) dependently of its measurement outcome 
y, which is eigenvalue of X. Here, we defined the channel & and the stochastic 
transition matrix Q} such that 


(0) = >) >) OF lui) (uil(ujloluj), y = CQ) = T(pmix, Q)- 
Lj 


In this case, we assume that the receiver received the information X and y. Then, 
the relation (9.160) holds. From these discussions, we can check the equation 
(9.161). | 


Proof of EM=>EC We define the channel #7,» with the input system 7/4 as follows 
First, we generate one-bit random number X with probabilities Po = 1 — p and 
P, = p. When X = 0, the output state is «(Tre p) for the input state p. When 
X = 1, we perform the measurement of the spectral decomposition of H,, and obtain 
the eigenvalue y of H. Then, the output state is p,, where p, satisfies H(p,) = y. 
In this case, the receiver is assumed to receive the information X and y. Then, the 
output entropy of the channel #7,» can be calculated as 


H (kx,p(p)) = (1 — p)H(«(p)) + p Tr Xp + h(p) — pH(P**). 
Further, 
H(Ryi,, @ Ryo, p(p)) 
=(1— p)?(H(«! @ «7(p))) + pC — p)(Tr X'p! + H(K?(p”))) 
+ p(l— p)(Tr X*p? + H(K'(p'))) + p? (Tr X'p' + Tr X7p”) + 2h(p) 
— pH’ *') — pH”). 


Condition EM implies 


: 1,2) _ 3 1 : 2 
min ge. (eo )=mnHz (9 )+minH  (p*). 
pi2 pXlp 5X? p! pp pe pep 


Since HP) < log d,dg, taking the limit p — 0, we have 


A(K1y,p p) > H,(p)+Tr Xp 
 (P) > Agign2(p) + (Tr X'p! + Tr X7p’). 
: 


Pal 22 
K OR 
A 1 
3X1 p px 
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Since the set of density matrices is compact, we obtain 

min Ay? yt +x in 

aE 


= min(A,. (po!) + Hya(p?) + Tr X'p! + Tr X7p", 
Psp 


which implies EC. a 
Proof of FA= FS See Pomeransky [15]. 


9.9 Historical Note 


9.9.1 Additivity Conjecture 


Bennett et al. [68] consider the transmission of classical information by using entan- 
gled states as input states. After this research, in order to consider the additivity 
of the classical channel capacity, Nagaoka [69] proposed quantum analogs of the 
Arimoto—Blahut algorithms [70, 71], and Nagaoka and Osawa [7] numerically ana- 
lyzed two-tensor product channels in the qubit case with quantum analogs based 
on this algorithms. In this numerical analysis, all the examined channels « satisfy 
C(k®*) = 2C(k). This numerical analysis strongly suggests Conjecture HM. This 
research was published by Osawa and Nagaoka [17]. Independently, King proved 
Conditions HM, EM, and RM with x! as a unital channel in the qubit system and 
kK? as an arbitrary channel [10]. Following this result, Fujiwara and Hashizume [72] 
showed HM and EM with «! and «? as depolarizing channels. Further, King [11] 
proved HM, EM, and RM with only «! as a depolarizing channel. Shor [12] also 
proved HM with only «, as an entanglement-breaking channel. 

On the other hand, Vidal et al. [73] pointed out that the entanglement of formation 
is log 2 when the support of the state is contained by the antisymmetric space of C?. 
Following this research, Shimono [74] proved FA when the supports of 7; and p2 are 
contained by the antisymmetric space of C*; Yura [75] proved that E f(p) = E.(p) for 
this case. Further, using the idea in Vidal et al. [73], Matsumoto et al. [13] introduced 
the MSW correspondence (9.8) or (9.29). Using this correspondence, they proved 
FS=>HM and FS>HL. 

Following this result, Shor [14] proved HL=>FA and HM= HL. Audenaert and 
Braunstein [76] pointed out the importance of the conjugate function in this prob- 
lem. Further, Pomeransky [15] proved the equivalence among FA, FC, and FS by 
employing the idea by Audenaert and Braunstein [76]. Shor also showed FA>FS 
independently. He also proved EM=> FA and (HM or FA)=>EM. Further, applying 
this idea, Koashi and Winter [77] obtained relation (8.173). Recently, Matsumoto 
[78] found short proofs of EM=EL and ELSML. In this textbook, based on his 
idea, we analyze the structure of equivalence among these conditions and derive 14 
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conditions (Theorem 9.3). Matsumoto [79] also introduced another measure of entan- 
glement and showed that its additivity is equivalent to the additivity of entanglement 
of formation. 

Further, Matsumoto and Yura [18] showed E;(p) = E,(p) for antisymmetric 
states. Applying the concept of channel states to antisymmetric states, they proved 
that C(«®”") = C(x) for antisymmetric channels. Indeed, this channel has been pro- 
posed by Werner and Holevo [80] as a candidate for a counterexample of Additivity 
HM or EM because they showed that it does not satisfy Condition RM for sufficiently 
large s. Vidal et al. implicitly applied the same concept to entanglement-breaking 
channels and proved FA when only / satisfies condition (8.147). Following discov- 
ery of this equivalence, Datta et al. [20] and Fannes et al. [19] showed HM and EM 
when « and «2 are transpose depolarizing channels. Wolf and Eisert [22], Fukuda 
[9], and Datta and Ruskai [21] extended the above results to larger classes of channels. 

However, besides of so many equivalent conditions, Hastings [81] showed the 
existence of a counter example for FA superadditivity of entanglement formation. 
Hence, it was shown that all of these equivalent conditions do not hold. 


9.9.2 Channel Coding with Shared Entanglement 


Concerning the channel coding with shared entanglement, Bennett and Wiesner 
[24] found the effectiveness of shared entanglement. Assuming Theorem 4.1 in the 
nonorthogonal two-pure-state case,*® Barenco and Ekert [83] proved the direct part 
of Theorem 9.4 in the two-dimensional pure-state case. Hausladen et al. [84] inde- 
pendently proved the unitary coding version of Theorem 9.4 in the two-dimensional 
pure-state case. Bose et al. [25] showed the direct part of Theorem 9.4 in the two- 
dimensional mixed-state case. Hiroshima [26] showed the unitary coding version of 
Theorem 9.4 in the general mixed-state case. Bowen [27] independently showed the 
same fact in the two-dimensional case. Finally, Horodecki et al. [28] and Winter [29] 
independently proved Theorem 9.4 in the form presented in this book. When the 
channel has noise, Bennett et al. [30] showed the direct part of Theorem 9.5 in the 
general case and its converse part in the generalized Pauli case. In this converse part, 
they introduced the reverse Shannon theorem. Following this result, Bennett et al. 
[31] and Holevo [32] completed the proof of Theorem 9.5. In this book, we proved 
this theorem in a way similar to Holevo [32]. 


8In their paper, it is mentioned that Levitin [82] showed the direct part of Theorem 4.1 in this special 
case. 
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9.9.3 Quantum-State Transmission 


Many researchers have treated the capacity of quantum-state transmission via a noisy 
quantum channel by algebraic methods first [54-60]. This approach is called quan- 
tum error correction. Using these results, Bennett et al. [64] discussed the relation 
between quantum error correction and entanglement of distillation. Following these 
studies, Schumacher [85] introduced many information quantities for noisy channels 
(Sect. 8.2). Barnum et al. [86] showed that a capacity with the error €2[®] is less 
than limy_,o 4 MAX p<g(749") Ic(p, «®") if the encoding is restricted to being isometry. 
Barnum et al. [66] proved the coincidence with two capacities C; (kK) and C2(«). They 
also showed that these capacities are less than lim, _. 4, i MAX y<5(712") I.(p, 62"). On 
the other hand, Lloyd [63] predicted that the bound J, (p, &) could be achieved without 
a detailed proof, and Shor [62] showed its achievability. Then, the capacity theorem 
for quantum-state transmission (Theorem 9.10) was obtained. Further, Devetak [43] 
formulated a capacity theorem for quantum wiretap channels (Theorem 9.8). Apply- 
ing this discussion, he gave an alternative proof of Theorem 9.10. Here, the bit error 
of state transmission corresponds to the error of normal receiver in wiretap channels, 
and the phase error of state transmission corresponds to information obtained by the 
eavesdropper in a wiretap channel. Indeed, the analysis of information obtained by 
the eavesdropper is closely related to the channel resolvability. Hence, in this book, 
we analyze quantum-channel resolvability first. Then we proceed to quantum wire- 
tap channels and quantum-state transmission. Indeed, Devetak [43] also essentially 
showed the direct part of quantum-channel resolvability (Theorem 9.7) in the tensor 
product case; however, our proof of it is slightly different from the proof by Devetak. 

Indeed, to obtain the capacity theorem for quantum-state transmission from the 
capacity theorem for quantum wiretap channels, we need an additional discussion. 
That is, we need to show that the entanglement fidelity is close to | when the bit 
error and the phase error of state transmission are close to 0. To clarify this point, we 
employ a duality relation between the security and the coherence. That is, combining 
Theorem 8.17 and Lemma 9.4, we show the capacity theorem for quantum-state 
transmission. Further, Devetak and Shor [87] studied the asymptotic tradeoff between 
the transmission rates of transmissions of quantum-state and classical information. 

When a channel is degraded, the wiretap capacity can be single-letterized as 
(9.75). This formula was obtained independently in the original Japanese version of 
this book in 2004 and by Devetak and Shor [87]. By applying this relation to quantum 
capacity, the single-letterized capacity is derived independently in several examples 
in this English version and Yard [88]. 

The strong converse of quantum-state transmission is more difficult. Morgan et 
al. [89] demonstrated that a “pretty strong converse” holds for degradable quantum 
channels, i.e., they showed that there is (at least) ajump in the quantum error from zero 
to 1/2 once the communication rate exceeds the quantum capacity. Then, Tomamichel 
et al. [65] showed Theorem 9.12. The proof essentially employed the strong converse 
of entanglement distillation (8.226), which was shown in the first edition of this book. 
To obtain Theorem 9.12, we additionally need to show Lemma 9.12. Note that the 


9.9 Historical Note 555 


bound given in Theorem 9.12 is not necessarily attained in general. They showed 
that the bound is attained for the phase-damping channel defined in Example 5.10. 


9.10 Solutions of Exercises 


Exercise 9.1 Assume that the initial state on 7/¢ is yy x,|uC). Then, the initial 
state in the total system is >°, ; sate lu, uP, we). When the measurement outcome 
is (i, j), the resultant state of this measurement is 


Cc A Cc i AT A B Cc 
mp2 Saralet fh ue) = Suh afl RELL) Do xeluf’, uP, uf) 


t kl 


“Sow ur (RZ) x)glup, uP, ug) = > (R'Z/)" x), (ug [laf uP) 


t kl k,l 


= >\(R'Z!)" x) up) = (XLZh)" Do xeluz) 
k k 


because all of outcomes occur with the equal probability a. Thus, the final state is 
XipZig Xp Zp)" Dy rele) = Dy Xelue)- 


Exercise 9.2 For any input state p, on the composite system, we have 


H((n! ®K°)(p,)) 2 HCY OF ph, @KL,)) 


2(x O* H(K2(p2.,)) + 10 Or! ) 
=H(L9; Phy) +Xe H(W(p2.,)) 


> min H(* (p' + min 102 (p°)), 


(a) and (b) follow from (9.35) and (5.110), respectively. 


a 


Exercise 9.3 Applying Theorem 4.1 to the channel «®”, we find that Cé(«) > fs 


which implies that Cé(«) > sup, & cn) =") Since sup, ea ) = limy +00 & C050)" fol- 
lows from Lemma A.1, it is Sinicient to show 
C. @n 
Ce(K) < lim oe (9.162) 
noo n 


Consider the code 6” = (N,, 6, Y) satisfying that e[6”] — 0. Since 
C.(K®") = sup, 1 (p, &®"), similar to (4.32) in the proof of Theorem 4.2, the Fano 
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inequality (2.35) yields that 


i log2 C.(K2" 
Tipe ee (9.163) 
n 1— [6] 


ri a ra . On 
Since e[®”] > 0, we have, limy_+oo + log Nn < limny-soo olka 


(9.162). 


, which implies 


Exercise 9.4 Using the monotonicity of the quantum relative entropy and (5.86), we 
obtain 


1(X:Y) 
i i) 
= Nr r((@ @ U5") (a's) | a DL (82 @ @ 65") (a's 
=] nj 
ibe gee 
=H(5- (22° @ 65") O's ) — LH (HH B18") (2's) 


i=l "j=l 


1 Nn 
<H( Tra; ra ye (i) @ i ees ) 


i=1 


4 (Tr — 5 (a) @ 3") Oe ) — min H((s ® 13")(p2"5). 
" j=l 
Using 
Tra, (OO) @15") (i's) = ate p 
(te 5 a (o @ @ 28") (p9"5 ) < logdim H,,, 
we obtain 


1(X:Y) <H(Tra, pa) + log dim Hy, — min H (( & 08") (p2"5)) 


=nH (Tra pa.) + logdim Hy, — min H((K® te a a): 


Exercise 9.5 The LHS of (9.48) can be rewritten as 


> pid [0 velj) @ te) (pa.a)ll >. Pi(K © Gelj) @ Le)(Par.R) 
j j 


<> piD((Ko velj) ® te )(pa.a)ll {>| Pilko Ge(/))pa”) ® pr 
Ei j 
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=H | >) pj(ko ge j))oa | + (pr) — >) Pi H(K 0 Gel J) ® tr) pa’,8) 
Jj Jj 


=H(K | >) pjve(i(pa)) + >) Pile(par, © Ge A) 
j j 

<H(K | >) pive(i)(pa))) + LCD PiPe(I) (pa), 
i j 


=I | >" piveli(oa). # | 
j 


from (4.7) and (8.52). Since p4’r is a pure state, we may write H(p4/) = H(pp). 


Exercise 9.6 Consider the code ®{”"? := (Ha:, Hr, x", Nn, el”, Y) satisfying 
that e[(O™:7] — 0. Due to (9.48) and (9.49) similar to (4.32) in the proof of Theorem 
4.2, the Fano inequality (2.35) yields that 


(log 2)/n + max, I(p, kK) 


(9.164) 
(=2o0"*| 


1 
—log Nn < 
n 


Since e[®“-*] > 0, we have, lim 4 log N, < max, (p, &), which implies the < 
part of (9.43). 


Exercise 9.7 


(a) Since all of POVM elements of M consist of real entries, H(K(p)) = H(«(Re p)). 
An arbitrary real pure state is given as the form |@)(6|, where |@) = cos £10) + 
sin | 1). Then, H(«(|9)(@|)) = f (0). Hence, we have 


a H(K(p)) = min H(K(0)(6|)) = min f (8). (9.165) 


Thus, 
C.(K) < max H(K(p')) — min H(«(p)) 
p p 
= H(K(Pmix)) — min f(0) = log 4 — min f(6). (9.166) 
Further, when we generate the state |@9) (@o| and the state |@) + 7)(@ + 7| with the 


same probability 5 for 0p := argmax, f (@), the transmission information J(p, k) = 
log 4 — ming f (@) holds. 
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(b) Since the calculation of ar (8) is easy, we calculate only £6) as follows. 


df 
—(0 
qe? ©” 
cos 7 1+cos@ sin@ 1l+sind  sin@ sin 0 rf sin 0 
= 0) Oo 
4 et staat 4 a 4 l1—cos@ 1+ cosé 


cos @ cos @ i. cos 0 
4 1+sinOd 1-—sin@ 


cos 7 1+cosé sin@ 1+sin@ 1 —cos?0 1 " 1 
= (0) Oo 
4 of = eos 4 eT sind 4 l1—cos@? 1-+cosé@ 


1 — sin? 0 1 1 
et 3 
4 1+sind 1-—sin@ 


cos 0 1+cosé sin@ 1l+sind 1 
= ] 1 1 é+1— 0 
ceo? as ie a 


1 
_ au —sind+1+sin@) 


cos 0 1+cos@ sin@ 1+ sin0 
a log og - 
4 1—cos@ 4 1—sin#é 


2 


& 6 1., L+eosd 1 sind sin0 Lf ay 
(c) Since ct (0) = 4 Jog ttt 4 884 Jog S82" — 1, we find that a()) is 


monotonically decreasing from oo (6 = 0) to —1 (0 = a) and that it is monoton- 


ically increasing from —1 (@ = 7) to oo (6 = 4). Thus, we find that af (8) is positive 
for 0 € (0, 5) because 44(0) = “(4) = 0. Similarly, we find that “4(6) is neg- 


ative for 0 € (7, 5) because at (2) = af (2) = 0. Since f(Z) = 2 log er + 


awd log x5 and f(0) = f(5) = 3 log 2, we obtain the behavior of f (0) given in 
the table, which implies the equation (9.56). 


Exercise 9.8 Since J(p, 0, &) is concave for p and K(UspU;) = K(p), we have 
J(p,0, 8) <J (/ UppU; 9, 0, “) = J(Pi\pP, + PopP2,0,k), (9.167) 
0 


where Uy = P, + e!? P>. The definition of J (p, 0, K) given in (9.44) implies that 
J(Ap1 @ qd ~ \)p2; 0, kK) = AJ (p1, 0, kK) ae qd = A) J (p2, 0, kK). (9.168) 


Hence, we obtain 


9.10 Solutions of Exercises 559 
max I (p, &) 2 min max J(p, 0, ) 
p op 
(b). 
=min max J(Ap; @ (1 — A)po, 9, &) 
o ,p1,p2 


© min max AJ (p1,0, 6) + 1 — A)I(p2, 7, K) 
o X,p1.p2 


< ee AJ (pi, Ky] (Pmax,1); kK) + ad —_ A) J (pa, K1(Pmax,1)> kK) 
sP1sp2 


= max ACZ (Ki) + (1 — AYCe (Kz) = max(Ce (Ki), Coe(K2)), 


where (a), (b), and (c) follow from (9.46), (9.167), and (9.168), respectively. 

Now, we assume that «) satisfies (9.52) and Cf ,(K1) = Cé,(K2). Then, Ce ,(«) = 
Ce e(K1) and C.(K1) = CE, (K1). Since C.(K) < Co .(K) and C.(K1) < C.(K), we 
have C,(K) = Cé ,(k). 


‘ : — onR — on 
Exercise 9.9 Apply Lemma 9.2 to the case with M = e”" and o = o7 Binks for the 


channel W. Since I,,(p", W”) = nIj\,,(p, W), there exists a code ©” for the 
channel W™ such that |@”| = e”*® and 


[Wo", &") < max (4v2, Vm) eins (PWR), (9.169) 


where Un is the number of eigenvalues of City Piss: SINCE Up is polynomial, taking 
the limit, we obtain (9.72). 


Exercise 9.10 In this case, we can replace (9.65) by || P, W, ||) < Tr Wy Pe. 
Exercise 9.11 Since PZ (0) = PE) = pot p3 and PE, (1) = PE.) 0) = 


K(€0) K(€o) 


Pi + po, we have I (pmix, Q) = log2 — h(po + p3). Since Exercise 8.29 yields that 


1 1 
I (Pmix, w*) =H. (Kp, Pmix) — 5 Melk, leo) (eol) — 3 Helkp, le1) (e1|) 


=H(p) — h(po + ps), 


we have 
I(Pmix, Q) — I(pmix, W") = log2 — H(p). (9.170) 
Exercise 9.12 Since 


I(p, W®) + 1(p', W®) — 1(q, W? @ W*) 


-o( Sac. x WwW @ we (x row?) @ bs roow?)), 


xx! 
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we have 


1(p, W®) + 1(p', W*’) — 1(q, W® @W*) 


=0(« @ (ZX q(x, x')W2 @ v') | 


(k @ k’) (= row?) Q (= row?) 
<p( Sats. awe & ow?) (> rows). 


I(p, W®)+1(p', W") — 1(q, W= @W*’) 
<I(p, W®) + 1(p', W®') — 1g, W® @ W*’), 


Thus, 


which implies (9.100). 


Exercise 9.13 Applying Lemma 9.4 to the channel «®”", we have C°?“(«) > 
1 MAX ye5(742") I.(p, £2"), which implies (9.78). 
Exercise 9.14 Using the same discussion as the proof of Theorem 9.8, we find that 
Ce («) = lim ¢ supg sup, (I(p, 6°" Q) — I(p, Ke®" Q)). 

Denote the new capacities by C3“ (W) and C°?-¥ (i), respectively. Since the con- 
dition 1") — Ois weaker than the condition [;(®™) > 0, we have ce (W) = 


C3-E(W) and CEP F(a) > C&8-F(«), Due to (9.98), the condition ino”) > 0 
implies 


_ | 
C?-*(W) < lim — sup sup (1(p, W? Q) — I(p, W*" Q)). 
n@Q Pp 


We find that C2“ (W) = C2 (W). 
Similarly, we have 


a ha 1 n n 
Cerin) = lim — sup sup (I(p, k®"Q) — I(p, ke®"Q)), 
Q p 


which implies that C&?-* («) = C¢-¥ (x). 


Exercise 9.15 From the definition, we find that Cc (K) < Ce (4). It is sufficient 
to show Cf : Fc) > max, I (p, «). Consider the code given in the proof of Theorem 
9.5. In this code, the eavesdropper’s state does not depend on the message to be sent. 
So, we obtain C3“ («) > max, I(p, &). 
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Exercise 9.16 Let Hz and He be the environment system of « and x’, respec- 
tively. Then, the environment system of Kk’ 0 & is He ® He. Thus, we have k’, © 


K(p) = Try, (k’ 0 &) ¢(p). Hence, (5.59) implies (9.79). Similarly, we have k_(p) = 
Try,,(K’ 0K) z(p). Thus, (5.59) implies (9.80). 


Exercise 9.17 Denote the modified capacity by C-£ (W). First, note that 
egal ® SFM d\((W* Q);, (W* Q);) 
255 7 wt Q)i. _ DWE D);) 
The concavity and monotonicity (Exercise 5.34) of 79 imply that 


| 
In(®) = 7 DA (gy DW") ))) — HW Q):) 
i=l j=l 


1 M 
Sig 2,14 warm, — H(W*Q),)| 
hee 1 
E ae E . 
Sy DA (W a Q); } logd 
1 
+ no (« ee ee 
<€r,alP]logd + no(Ezal®)), (9.171) 


where the final inequality follows from Fannes’ inequality (5.92). Due to (9.171), 
when €¢.,[®™] — 0, we have feo) — 0. Exercise 9.14 yields that C2.“ (W) < 
Cc B.E(W). The opposite inequality ee holds from the following relation: 


my d\((W* Q);, (W*Q);) 
EE, al [® =e 2 M(M 7 D 
a) d\((W¥ Q);, 7 >, (W* O),) + di ((W* O);, 7 > (W* O)) 
= paps M(M — 1) 


i j#i 
OD) 25> DWE Oil > (W* Q),)'? + D(WFO) A> (W* O),)'” 
2M(M — 1) 


i “ 
=2 4 + DWE Qi = 7 HO ye 
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L 


1/2 

c 1 1 

a(x DW? ill ovo) = 1n(0)'?, (9.172) 
k 


where (a), (b), and (c) follow from the triangle inequality, quantum Pinsker inequal- 
ity (3.53), and the Jensen inequality for the function x +> x!/?, respectively. 


Exercise 9.18 Denote the modified capacity by C2-“(W). Since eg,4[6”] => 
£.wl®], we have C2=(W) < C2£(W) = C8-=(W). So, we will show the oppo- 
site inequality. Now, we consider i be a random variable subject to the uniform 
distribution on {1,..., M}. Then, Markov inequality (2.158) implies that 


1 : 1 
a f a Eevon) = 2100) | = 


Then, we number all elements of {i|D((W* Q);|lZ >°,(W* Q)x) < 2Ie(®)} as 


; (9.173) 


Nie 


(ow Q); 


ij,..-,2x, and denote the code whose message set consists of i,,...,ix, by é. 
Then, K > uw Similar to (9.172), we can show that 


A 


Ez,wlP] = sup sup di ((W” Q);,, (W Q)i,,) 
jos’ 


1 1 

< sup sup (acevo, Wp DWE De) + (WE Din» = DWE 2») 

jos k k 

1 
=2 sup d (wor iv, Yevon) 
J k 
1 

<sup D(WQ)i,-- > (WO)? < V21e(@). 

J k 


Hence, we have C2--(W) > C®-F(W). 
Exercise 9.19 We have 


(Op, W®) — 1(Op, W*) — >) pi(1(Q', W*) — 1(0', W*)) 
=>) (Op); DWP I|WG,) — DWF IW6,) 
j 
— > >): Q5 (Dw) | WE,) — DWF IWE,)) 


ij 
=>) > iQ) (DWWEllWG,) — DWEIWG,)) = Lp, W8Q) — T(p. W*Q). 
ij 


Since (5.59) implies I(p, W? Q) — I(p, W® Q) = I(p, W2Q) — I(p, K(W? Q)) 
> 0, we have (9.76). 
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Exercise 9.20 Consider (8.25) witha = 1/2. When é2[®] —> 0, wehave e\[®] — 0. 
Hence, Cy,2(K) < Cg,1(k). 


Exercise 9.21 Due to (8.26), when ¢,[®] > 0, we have e2[®] > 0. Hence, 
Cq,2(K) = Cy,1(k). 


Exercise 9.22 Assume that any two channels «1,, and k2,, from a system H,, to 
another system H), satisfy max, 1 — F? (Kin ® tr)(|x) (x|), (K2,n © br) (|x) (x|)) > 
0. Then, (3.48) implies that 


max ||Kin ® er (Ix) (x1) — Kan @ er (x)(xD lh > 0. (9.174) 
Hence, 
max |[K1n(Trp |x) («]) — Kan (Tre |x) (xl > 0. (9.175) 


By applying Fannes inequality (Theorem 5.12) to two states (K;, ® tr) (|x) (x|) and 
(K2,n ® tr) (|x) (x|), (9.174) yields 


|H (K1.n @ r(x) (x|)) — A(ka.n @ br (x) (x1))| 
log(dim H,,dim‘H,) 


Similarly, (9.175) yields 


|H (K1.n(Trp |x) (x|)) — A (Kan (Tre |x)(x|))| 
log(dim H,,dimH, ) 


Since [.(Trp |x)(x|, K) = H(K(Trp |x)(x|)) — H(k ® tr (|x) (x])), we obtain 


| max, I.(p, Kin) — max) I.(p, K2,n)| = max, [I-(p, Kin) _ I.(p, K2,n)| 
log(dim H,, dimH,) - log(dim H,dimH ) 


Exercise 9.23 


(a) Let (H(, 7, v) be ®. Apply the property @ of Sect. 8.2. Let 7c be the subspace of 
H with the dimension min(H, 71,4). Then, we can choose an isometry U from He 
to H4 such that 


2 
F; (Pmix, YO KO T) < F.(pmix, ¥9 KO Ky), 
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which implies that 
20 — Fe(pmix, Y9KOT)) S1— Fe(pmix, Y0 KO Ky). (9.176) 


Then, the code (H4, ky, V) satisfies the desired properties for &’. 
(b) Since the first inequality is trivial, we will show the remaining inequalities. (8.39) 
and (8.42) imply 


IU pmixU*, 6) = Ie(Pmix, KO Ku) = Tc(Pmix, YO KO Ky). 
Applying (8.38) to the case with pmix and v o Ko Ky, we have 
log |®| — [.(pmix,; V0 KO Ky) <26 (log |®| — log 6). 
(c) Combining (a) and (b), we have 


1 
lim — max I,(p,%® 


n-on peS(H&") 


1 2 
>— log min(|® |, di) — =d, (log min(|® |, d) — log bn) , 
n n 
where 6, = Vv 4e2[@™]. Since Lemma A.1 implies 


. ol ie ak 1 1 on 
lim — max [.(p,4°") =limsup— =limsup— max J[,(p, >"), 
NCO N peS(H§") n>oo NN noo MN peS(H§") 


taking the limit, we obtain (9.149). 
(d) The first inequality of (8.38) implies that 11.(p, K2") < logd,. When limy_, 5 
1 MAX pe 5(H12") I.(p, K2") < log da, (9.149) implies that 


! 1 
lim sup — log min{|@ |, a} = lim sup — log |® |. 
n>oo N n>oo Nl 


Hence, we have limy_; 5 + MAX pes (112") I.(p, 52") = Cq,2(K). 
When lim); 00 7 MAX pes(712") I.(p, K®”") = logd,, we can show that logd, > 
Cq,2(K) as follows. Consider a sequence of codes &”) = (H®",7™, v™). Then, 
8.22) implies that 1 — e[®] = F.(pmix, U™ 0 Ke" 07) < ,/4_. When 2 
p p exon 


[6] + 0,wehavelim sup,_,,, + log |®| < d4, whichimplies log d4 > Cy,2(k). 


Exercise 9.24 Consider two states p; and p2. Choose their purifications |®,) (®,| and 
|®2) (| so that their reference systems 7/R; and 72 are disjoint to each other. So, 
we choose a purification |®)(®| of Ap; + (1 — A)p2 such that the reference system 
is Hri B Heo, Pi|P)(P|Pi = A|Pi)(Pi|, and P|P)(H| P, = (1 — A)|Po) (PI, 
where P; is the projection to 71p;. Hence, we have 
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Ispp(p, &) = Espp(K(|®)(P|)) = Espp(K(A|P1) (Pi + C1 — A)| G2) (21) 
=AEspp(K(|®1) (®1|)) + Cl. — A) Espp (4 (| 2) (2|)) 
=AIspp(p1, &) + (1 — A)Ispp(p2, 6). 
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Chapter 10 
Source Coding in Quantum Systems 


Abstract Nowadays, data compression software has become an indispensable tool 
for current network system. Why is such a compression possible? Information com- 
monly possesses redundancies. In other words, information possesses some regu- 
larity. If one randomly types letters of the alphabet, it is highly unlikely letters that 
form a meaningful sentence or program. Imagine that we are assigned a task of com- 
municating a sequence of 1000 binary digits via telephone. Assume that the binary 
digits satisfies the following rule: The 2nth and (2n + 1)th digits of this sequence 
are the same. Naturally, we would not read out all 1000 digits of the sequence; we 
would first explain that the 2nth and (2n + 1)th digits are the same, and then read 
out the even-numbered (or odd-numbered) digits. We may even check whether there 
is any further structure in the sequence. In this way, compression software works by 
changing the input sequence of letters (or numbers) into another sequence of letters 
that can reproduce the original sequence, thereby reducing the necessary storage. 
The compression process may therefore be regarded as an encoding. This procedure 
is called source coding in order to distinguish it from the channel coding examined in 
Chap. 4. Applying this idea to the quantum scenario, the presence of any redundant 
information in a quantum system may be similarly compressed to a smaller quantum 
memory for storage or communication. However, in contrast to the classical case, we 
have at least two distinct scenarios. The task of the first scenario is saving memory 
in a quantum computer. This will be relevant when quantum computers are used in 
practice. In this case, a given quantum state is converted into a state on a system of 
lower size (dimension). The original state must then be recoverable from the com- 
pressed state. Note that the encoder does not know what state is to be compressed. 
The task in the second scenario is to save the quantum system to be sent for quantum 
cryptography. In this case, the sender knows what state to be sent. This provides 
the encoder with more options for compression. In the decompression stage, there 
is no difference between the first and second scenarios, since their tasks are con- 
versions from one quantum system to another. In this chapter, the two scenarios of 
compression outlined above are discussed in detail. 
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10.1 Four Kinds of Source Coding Schemes 
in Quantum Systems 


As discussed above, source coding can be formulated in two ways. In the encoding 
process of the first scheme, we perform a state evolution from an original quantum 
system to a system of lower dimension. In that of the second, the encoder prepares 
a state in a system of lower dimension depending on the input signal. In the first 
scenario, the state is unknown since only the quantum system is given. Hence, the 
first scheme is called blind. In the second scenario, the state is known, and this scheme 
is called visible. The quality of the compression is evaluated by its compression rate. 
Of course, a lower dimension of the compressed quantum system produces a better 
encoding in terms of its compression rate. We may choose the compression rate to 
be either fixed or dependent on the input state. Coding with a fixed compression 
rate is called fixed-length coding, while it is called variable-length coding when the 
compression rate that depends on the input state. Therefore, there exist four schemes 
for the problem, i.e., fixed-/variable-length and visible/blind coding. 

Let us summarize the known results on fixed- and variable-length compression in 
classical systems. In fixed-length compression, itis not possible to completely recover 
all input signals. Decoders may erroneously recover some input signals. However, 
when the state on the input system is subject to a certain probability distribution 
and the compression rate is larger than a threshold, an application of a proper code 
reduces the probability of erroneously recovering the state so that the error probability 
is sufficiently close to zero [1, 2]. This threshold is called the minimum admissible 
rate. In order to treat this problem precisely, we often assume that the input data 
is subject to the n-fold independent and identical distribution of a given probability 
distribution with sufficiently large n. 

In variable-length compression, it is possible to construct a code recovering all 
input signals perfectly. This is an advantage of variable-length encoding over fixed- 
length encoding. In this case, since there is no occurrence of erroneously decoding, 
we measure the quality of the variable-length encoding by the coding length. The 
worst-case scenario in this type of coding occurs when the coding length is greater 
than the input information. However, when the input is subject to a certain probability 
distribution, the average coding length can be shorter than the number of bits in the 
input. For an independent and identical distribution, it has been shown that the average 
coding length is equal to its entropy in the optimal case [1, 2]. 

Let us now turn to quantum systems. As for the classical case, for fixed-length 
coding, it is possible to construct a coding protocol with an error of sufficiently small 
size for both visible and blind cases, provided the compression rate is larger than 
a certain value [3, 4]. This construction will be examined in more detail later. In 
fact such an encoding has already been realized experimentally [5]. For variable- 
length coding, in many cases there does not exist a code with zero error of a smaller 
coding length than the size of the input information [6]. However, when we replace the 
condition “zero error” by “almost zero error,” it is possible to construct codes with the 
admissible compression rate. Therefore, if the information source is a quantum state 
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that is generated by an n-fold independent and identical distribution of a “known” 
distribution, variable-length encoding does not offer any advantage. 

On the other hand, if we do not know the probability distribution to generate the 
quantum state, the situation is entirely different. In fixed-length coding, since the 
compression rate is fixed a priori, it is impossible to recover the input state with a 
small error when the compression rate is less than the minimum admissible rate. In 
this case, it is preferable to use variable-length encoding wherein the compression 
rate depends on the input state [7, 8]. However, as a measurement is necessary to 
determine the compression rate, the determination of the compression rate causes 
the state reduction due to the quantum mechanical nature. Consider a method to 
determine the compression rate based on the approximate estimate of the input state. 
This approximate estimation requires a measurement. If the initial state is changed 
considerably due to the measurement, clearly we cannot expect that the decoding 
error is close to zero. It is therefore necessary to examine the trade-off between the 
degree of state reduction and the estimation error of the distribution of the input state, 
which is required for determining the encoding method. 

As will be discussed later, both this estimation error and the degree of the state 
reduction can be made to approach zero simultaneously and asymptotically. There- 
fore, even when the probability distribution for the quantum state is unknown, we can 
asymptotically construct a variable-length code such that the minimum admissible 
rate is achieved with a probability close to 1 and the decoding error is almost 0 [9, 
10]. In particular, when a given coding protocol is effective for all probability dis- 
tributions, it is called universality; this is an important topic in information theory. 
Various other types of source compression problems have also been studied [11, 12]. 


10.2 Quantum Fixed-Length Source Coding 


The source of quantum system H is denoted by 
W:X > S(H) (xP W,) (10.1) 


(which is the same notation as that in a quantum channel) and a probability distrib- 
ution p in V. That is, the quantum information source is described by the ensemble 
(px, Wy)xex. Let K be the compressed quantum system. For the blind case, the 
encoder is represented by the TP-CP map t from S(H) to S(K). The decoder is 
represented by a TP-CP map v from S(K) to S(H). The triplet w = (K, T, v) is 
then called a blind code. In the visible case, the encoder is not as restricted as in the 
blind case. In this case, the encoder is given by a map T from % to S(K). Any blind 


encoder t can be converted into a visible encoder according to t o W. The triplet 


Ww = (K, T, v) is then called a visible code. That is, the information is stored by 


a quantum memory. The errors €p,w(y) and €p,w(W) and sizes |y| and || of the 
codes y and W, respectively, are defined as follows: 
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epw) = > ps (1- FW, vo t(Wx))), [wl dimkK  — (10.2) 
xEX 

epw) 2 > p, (1— F?(W,, vo TQ), YS dimK. (10.3) 
xEX 


We used 1 — F?(., -) in our definition of the decoding error. 

Now, let the source be given by the quantum system H®" and its candidate states be 
given by W : ¥™ > S(H®") (x" = (4,-0.4) BW © W,, @---@ W,,). 
Further, let the probability distribution for these states be given by the nth-order 
independent and identical distribution of the probability distribution p in V. Denote 
the blind and visible codes by yw and Y™, respectively. Define! 


ec. . Te 1 
Rpq(p, W) = inf {i toe WOH ep woo (hb) > of. (10.4) 
wm} n 
def | (n) (n) 
Ryq(p, W) = inf lim — log |W Ep wo(W’) > OF, (10.5) 
wr} n ; 
def 1 (n)|| oy (n) 
Bq (P» W)= nan lim — log |y lim ep wow’) <17, (10.6) 
n) n 
def 1 (n) || Tan (n) 
Ry 4 (P. W)= on lim 7 oe Ww lim € pn wo(wr’) <1). (10.7) 


Since a blind code yw can be regarded as a visible code, we have 

Rpg(p, W) > Rvg(p. W), Reo(p.W)> Ry 4(p.W). (10.8) 
From the definitions it is also clear that 

Rpg(p, W) = Rpg(p.W), Rvg(p.W)> Ry 4(p.W). (10.9) 


The following theorem holds with respect to the above. 


Theorem 10.1 [fall of states W,. are pure states, then the quantities defined above 
are equal. We have 


Rpq(p, W) = Ry g(p, W) = Rv.g(p, W) = Ry g(p, W) = H(W,). (10.10) 


This theorem can be proved by combining the following two lemmas. 


Lemma 10.1 (Direct Part) There exists a sequence of blind codes {w"} satisfying 


1 
—log|y| < H(W,) —6 (10.11) 
n 


'The subscript q indicates the “quantum” memory. 
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Em wo(y™) > 0 (10.12) 


for arbitrary real number 5 > 0. 


Lemma 10.2 (Converse Part) [fall of states W,. are pure states and the sequence of 
visible codes {VW} satisfies 


oe | 
lim — log|¥ | < H(W,), (10.13) 
n 


then 
En wn (UW) > 1, (10.14) 


Lemma 10.1 tells us that Rg.g(p,W) < H(W,), and Lemma 10.2 tells us that 
RY AD: W) = H(W,). Using (10.8) and (10.9), we thus obtain (10.10). 

Further, we have another fixed-length coding scheme. In this scheme, the state 
is given as a pure state |x)(x| on the composite system 714 ® 7/r, and encoder and 
decoder can treat only the local system 7/4. Then, our task is recovering the state 
|x)(x| on the composite system 74 ®@ 71r. Hence, the code of this scheme is the 


triplet y = (K, t, v), which is the same as that of the blind scheme. The error is 
given as 
, def 
E(w) = 1 = (x|(v @ 0) 0 (t @H(|x){x)|x) = 1 — F7(p, vot), 
where p = Trpr|x)(x|. Recall the definition of the entanglement fidelity (8.19). 
Hence, the quality depends only on the reduced density o. This scheme is called the 


purification scheme, while the former scheme with the visible case or the blind case 
is called the ensemble scheme. Hence, we define the minimum compression rate as 


def. —— 1 (n) 
Rpq(p) = inf ) lim — log |r| 
{yw} n 


Een (yi) > o| (10.15) 


- def . Pirated 1 (n) 
Rp 4 (p) = inf j lim—log|y”” | 
: wo) n 


Tim €,n(w) < 7 (10.16) 


When all of states W, are pure, Inequality (8.24) implies that 1 — ew,(W) < 1— 
Ep,w(W). Hence, we have 


Rp.qg(Wp) < Rag(p,W), Rpg(Wp) < Rpg(p, W)- (10.17) 


Using this relation, we can show the following theorem™*”*. 


Theorem 10.2 


Reale) = RL, y= H@): (10.18) 
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Exercises 


10.1 Show that the condition °,.y. p2F(W”, vn(T(x))) > 1 is equivalent to 
the condition S° yey peF (WY, vn(T(x)))? > 1. 


10.2 Define other error functions 


Epwh) = > pxdi(We. v0 (We), (10.19) 
xEeX 

Ep.w() = >> prdi(We, v 0 T(x)). (10.20) 
xEX 


Show that the optimal rates Rz.,(p, W) and Ry,,(p, W) given in (10.4) and (10.5) 
are not changed even when the conditions €» wo (Ww) > Oand é,» woo (Y™) > 0 
are replaced by &,» woo (Ww) > O and én wn (Y™) > 0. 


10.3. Construction of a Quantum Fixed-Length Source 
Code 


Let us construct a blind fixed-length code that attains the minimum compression rate 
H(W,,) when the quantum state is generated subject to the independent and identical 
distribution of the probability distribution p. (Since any blind code can be regarded 
as a visible code, it is sufficient to construct a blind code.) Since formula (8.24) 
guarantees that 


Ep.wW) = >) p(a)(1 — F?(Wx, v0 t(Wx))) 


xEX 
<1- F?(W,,vot)= ew, (WW), (10.21) 
it is sufficient to treat the purification scheme. 
Now define Oe = qo. Let the encoder tp : S(H) — S(Ran P), using the 


projection P in 1, be given by 


tp(p) = PeP +Trl(I — P)plonix. (10.22) 


Define the decoder vp as the natural embedding from S(Ran P) to S(H), where 
Ran A is the range of A. 
Let x be the purification of p. Then, 


F?(p, vp 0 Tp) 
=(x|(I @ P)|x)(x|(I @ P)|x) 
+ (x|pP,, @ Try [U @ UI — P))|x)(x|U ® (I — P))] |x) 
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>(x|( ® P)|x)(x|(1 ® P)|x) = (Tr Pp)? 
“ifi=o Tt Pay 2 1=20 = Te Po) (10.23) 


We now define b(s, R) = avo wis) = W(s|p), So = argmaxy —, -| ee 
for R > H(p)and0 <s < 1. Wechoose P such that 


P, def wen — ea nbso.R) 0} . (10.24) 


Then, from (3.2), (3.4), and (10.23), the code 6” = (K,,, Tp,, vp,) Satisfies 


. def = 
dim K,, = Ran P, = Tr {p®" —e HONS TED cs Oo} < e"® 


so R—-W(s9) 


Ean (6) <2Tr pe" fii? _ e—7P(s0,R) 2 0} < ga aa (10.25) 


The combination of (10.21) and (10.25) proves the existence of a code that attains 
the compression rate H(p) + 6 for arbitrary 5 > 0. Hence, we have proven Lemma 
10.1. Note that the code constructed here depends only on the state p and the rate 
R. In order to emphasize this dependence, we denote this encoding and decoding as 
Tn,p,R aNd Vy o,R, Tespectively. 

We next show that the code given above still works even when the true density p’ 
is slightly different from the predicted density ». This property is called robustness 
and is important for practical applications. 

Let us consider the case where the true density p’ is close to the predicted one p. 
Choosing areal number a > 0, we have 


p’ < pe*. (10.26) 
Hence, 
Trp? 22" Tr p™. 
Using the same argument as that in the derivation of (10.25), we obtain 


El yan (6) < 2Tr pe" { pe" — ent 0} 


W(s9)—soR 
<2e™ Tr pe"{ pe" eM < 0} Z genet i=sy 


(10.27) 


Therefore, if @ < maxo<s<| SRV) then €’.5,(@8™) > 0. 
Let us now prove the converse part of the theorem, i.e., Lemma 10.2. For a proof 
of Lemma 10.2, we prepare the following lemma, which is proved in Sect. 10.10. 
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Lemma 10.3 (Hayashi [13]) Any visible code V = (K, T, v) satisfies 

1—e,w(W) < a|¥|+ Tr W,{W, — a = O} (10.28) 
forVa > 0. 


Proof of Lemma 10.2 The above inequality (10.28) can be regarded as the “dual” 
inequality of inequality (10.25) given in the proof of the direct part of the theo- 
rem. Inequality (10.28) shows that the quality of any code is evaluated by use of 
Tr W,{W, — e* > 0}. Inequality (10.28) plays the same role as (2.5.2) in Sect. 2.1.4, 
and thus any sequence of codes {W} satisfies 


w(s)—sR 


1 ep won (W™) <2e" (10.29) 


Choosing an appropriate so < 0, we have Wook < 0. Therefore, we obtain 
(10.14), which gives us Lemma 10.2. EH 


In order to construct a code with the compression rate H(W,), we replace R by 


H(W,) — vit in (10.29). Approximating w(s) as H(W,)s + sw’ (0)s?, we obtain 


aie v(s) — s(H(Wp) — ar) C 
s<0 l-s ~ Ww" (0)./n- 


(10.30) 
Hence, 


€ pn.wn (UW) < 20 V7 —> 0, (10.31) 


Finally, let us focus on the property of the state on the compressed system when 
the asymptotic compression rate is H(W,). 


Theorem 10.3 (Han [14]) When a sequence of codes {V, = (Ky, Tn, Vn)} satisfies 
1 

Ep wo(Gn) > 0, —log|W,| > H(W,), (10.32) 
n 


we obtain 


1 K 
—D wr ‘A 0. 10.33 
- (> Pr wots) (10.33) 


That is, the compressed state is almost completely mixed in the sense of the normal- 
ized quantum relative entropy 1 D((,||on). However, the compressed state is different 
from the completely mixed state if we focus on the Bures distance, trace norm, or 
quantum relative entropy. This fact has been shown in the classical case [15]. 


10.3 Construction of a Quantum Fixed-Length Source Code 577 


Proof From the monotonicity of the transmission information (5.59) we have 
log |Pn1 = HS. ptTn(x)) = HD eT) — Yo PtH Ta) 
>H(S pte) — S ptAoiey). 


From condition (10.32) the two conditions in (5.105) yield 


Jim, - (Gos prv(T,(x))) — a) = H(W)). 


Hence, we obtain 
ie > per ) =H. 
Jim —H( PL Tn(2)) = HW). 


Since Tr >°. pT, (x) log ie = — log|®,,|, relation (10.33) holds. | 
Exercise 
10.3 Prove (10.18) by using (10.25) and (10.17). 


10.4 Universal Quantum Fixed-Length Source Codes 


The code given in Sect. 10.3 depends on the quantum state W,. For the classical 
case, there exists a code that depends on the entropy H(p) and works when the data 
are generated subject to the independent and identical information source of p. Such 
codes are called universal codes. To construct such universal codes, we often use the 
method of types, which is discussed in Sect. 2.4.1 [16]. Similarly, for the quantum 
case, there exists a code that depends only on the entropy H (W,,) of the average state 
W, and works well provided the states are generated according to an independent 
and identical distribution of p [17]. In this subsection, we propose such a kind of 
universal code. 

For this purpose, the projection P,, given by (10.24) should depend only on the 
compression rate. That is, we have to construct a subspace 7;,(R) of H®” depending 
only on the compression rate R. As the first step, we construct a code depending only 
on the compression rate and the basis B = (u,..., ug) comprising the eigenvectors 
of W,. Let us consider a set of types T, with the probability space Ng = {1,..., d}. 
Define the subspace Y,,(R, B) of the n-fold tensor product space H®” to be the 
space spanned by Ugern:H(q)<R{U (in) }i,e72 where u (i, ) = uj, @--: @u;, € He" 
andi, = (i, ...,i,). Let Py.r,p be a projection to Y,,(R, B). Then, according to the 
discussion in Sect. 2.4.1, we can show that®* !°* 
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dim Y%;,(R, B) < (n+ 1)%e"®, (10.34) 
Trl — Prep) W2" < (n+ D4 exp(-n inf D(qiir)), (10.35) 
qg:H(q)>R 


where r is the distribution that consists of the diagonal elements of W,. Hence, 
the code {(1%,(R, B), vp, p5>TP,2,)} almost has the compression rate R. Since 
mino<s<1 Baik = ming: 7(q)>r D(q|lr), its entanglement fidelity ae a ae VP, pp 2 
TP, rp) asymptotically approaches 1 when R > H(W,,). This code is effective when 
the basis B = {u1,..., Ug} is known. 

However, when the basis B is unknown, we need a subspace depending only 
on the compression rate R. For this purpose, we define the subspace Y,,(R) as the 
subspace spanned by Ug7Y,,(R, B). That is, we consider the union for all of bases B 
in 71. Then, the projection P,,z is defined as the projection to Y,,(R). Thus, we can 
show that the space Y,,(R) and the projection P,z satisfy 


dim Y;,(R) < (n+ 1)4t@ er, (10.36) 
Tr(I — Prr)W2" < (n+ 1)‘ exp(—n inf D(q|ir)). (10.37) 
q:H(q)>R 


Hence, the entanglement fidelity F.(W2”, vp, , © Tp, ,) asymptotically approaches | 
when R > H(W,). Then, we can conclude that the blind code (1, (R), Tp,.2: VP, g) 
works when R > H(W,). Since the blind code (Y,(R), Tp, 2, VP, p) does not depend 
on the basis of W,,, it can be regarded as a universal code. 


Proofs of (10.36) and (10.37) Since (10.37) follows immediately from P, pr > 
P..R,B, We prove inequality (10.36) as follows. For simplicity, we consider the case 
of d = 2, but this discussion may be easily extended to the general case. First, we 
fix the basis B = {uw , ua}. Then, an arbitrary basis B’ = {u', u,} may be written as 
ui = au, + bur, u, = cu, + dup using d* = 4 complex numbers a, b, c, d. Thus, 


Uy, @ us @---@uj = (au, + buy) ® (cuy + duz) ® +++ @ (au; + buy). 


Choosing an appropriate vector v,, n5,n;.n, € H®", we have 


/ / / n n nl 71. 
U,; Qu, @--- Quy = > a CS d™ Un, non3,ng° 


Ny ,N2,N3Z,N4 


The vector Up, n5,n3,n, does not depend on a, b, c, and d. Hence, the vector u, ® u5, ® 
-++@u' belongs to the subspace spanned by the vectors v,, n5,n3,n, With the condition 
ny +nz+n3+n4 =n. The dimension of this subspace is at most (n + 127! = 
(n+1)¢ *-1_ Since the dimension of the space Y;,(R) is at most this number multiplied 
by the dimension of the space Y,,(R, B) with a fixed basis, we obtain (10.36). =H 


Finally, we show an inequality complementary to (10.37), which will be used in the 


: . def : = : : 
next section. Choosing so = argmin, <9 Wisk with w(s) = sA\_,(W,), we obtain 
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Tr P, pW2" ~ex min 
n,R p a p s<0 t=-s 


(b) ( W(so) — soR 
n 


ny(s) — slog dim a 


<2 exp 


—s0 5 
+ (d + d*) log(n + 1) 
1 — so 1 — So 


Qo + 1) 0 oxp(—n inf  D(ql|lr)), (10.38) 
q:H(q)sR 


where (a), (b), and (c) follow from (2.54) and (10.36), and (2.65), respectively. 
Inequality (10.38) implies that the blind code (Y,(R), Tp, ,, Vp, ,) does not work 
when R < H(W,). However, this conclusion can be trivially shown from Theorem 
10.1. 


Exercise 


10.4 Show (10.34) and (10.35) by using (2.154), (2.155) and (2.156). 


10.5 Universal Quantum Variable-Length Source Codes 


Let us construct a code that has a sufficiently small error and achieves the entropy 
rate H(W,), even though the entropy rate H(W,,) is unknown, provided the source 
follows an independent and identical distribution of p. Such codes are called universal 
quantum variable-length source codes. For these codes, it is essential to determine the 
compression rate dependently of the input state.” If nonorthogonal states are included 
in the source, then the state reduction inevitably occurs due to the determination of 
the compression rate. Hence, the main problem is to reduce the amount of the state 
reduction as much as possible [9, 10]. 

Let us first construct a measurement to determine the compression rate by 
using the projection P,,r given in the previous section. Consider the projection 


En,R = lime+0(Pn,r — Pn,r—e). Let 2n = {H(p)}pern be a set of R such that 
E,,r 1s nonzero. Then, Doren: En,r = IT. Due to (10.37) and (10.38), the probability 
distribution for the outcome of the measurement FE, = {E,_r,}; satisfies 


Pyort| (Wp) — Ril = €} 


<2max} (n+ 1)“ exp(—n_ inf D(qiir)), 
q:H(q)<R-€ 


(n+1)¢exp(—n inf = D(g|ir))}. 
q:H(q)2R+e 


Even though the error of universal quantum variable-length source code converges to 0, its con- 
vergence rate is not exponential [9]. 
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We may therefore apply the arguments of Sect. 7.4 as follows. We choose In, dn 
so that they satisfy (7.63) and (7.64). Then, the POVM M“)"" given from E,, in 
Theorem 7.8 satisfies 


Fo(we", K ym.inn) > 1, 


Since the measurement M*"” takes values in [0, log d] with spacing 6,,, the num- 
ber of its possible outcomes is (4 +1). Hence, we choose 6, such that + ; logé, > 0. 
Construction of universal quantum variable-length source code We now construct 
a universal variable-length code based on this measurement. In the encoding step, 
we perform a measurement corresponding to the instrument K ygi.in.n. When the 


Syol 
measurement outcome is R;, the resulting state is a state in Ran M; ().onln The state 
in the space Ran 1 ntn ig sent with the outcome R;. Then, the coding length is 


log dim Ran i +log( ze +1). The compression rate is this value divided by n. 


Analysis of our code Since the second term converges to zero after the division by 
n, we only consider the first term. Since 


: (1),8nsln 
dim Mp = > rank E}, Rr’ 
R-48,<R’'<R45, 


dim %(R + 6,) < (n+ [yh gm(R+5n), 


IA 


we obtain 


1 ; (n),8n, logd 
— ; log dim Ran M,""""""" + log 5 +1); <R 
n n 


Therefore, in this protocol, the compression rate is asymptotically less than the 
entropy H(W,,) with a probability of approximately 1 [more precisely the compres- 
sion rate converges to the entropy H(W,,) in probability]; the error also approaches 
zero asymptotically. That is, we can conclude that the above protocol is a universal 
quantum variable-length source code. 


10.6 Mixed-State Case and Bipartite State Generation 


So far, we have treated quantum data compression when W, is pure. That is, in this 
case, the optimal compression rate in the blind case coincides with that in the visible 
case. However, when W,, is not pure, these are different. In this problem, one may 
think that the quantity H(W,,) or I(p, W) is a good candidate for the optimal rate. 
This intuition is not entirely inaccurate. In the blind scheme, if the ensemble (p,, W,) 
has no trivial redundancy, the optimal compression rate is given as follows [18]: 
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Rpq(p, W) = H(W,). (10.39) 
On the other hand, in the visible scheme, the inequality 
Ry q(p, W) = I(p, W) (10.40) 


holds '°” [19]. However, it does not give the optimal rate in general: 


Theorem 10.4 


_ wart — ji 1 s7@n od 
Ry.g(p, W) = E-* (Wp) = lim —E,(W,"), Wp = Di palet (E01 @ We. 


(10.41) 
In fact, Horodecki [20] focused on the quantity 
He def 2 ext 
(P, = Wert: ae of Wy u( P. i ) 
and proved 
He WwW, n 
Rew Sn (10.42) 
n—->0oo n 


From the definition of E,(W,), we can easily check that E, (Wp) < H*'(p, W). 
When all of states W, are pure, Ry,,(p, W) = H(W,). This fact matches (8.193). 

Before the proof of Theorem 10.4, we address this problem in a more general 
framework. Suppose that, given a bipartite state o on the bipartite system H,4 ® 7H,, 
Alice and Bob intend to share the state o by using limited amount of noiseless 
quantum communication. This task is called bipartite state generation. The following 
operations are allowed for this task. First, Alice generates a bipartite state o’ on the 
bipartite system 714 ®@ K. Second, Alice sends the system K to Bob. Finally, Bob 
applies a TP-CP map v from the system K to 7g. Then, Alice and Bob share the 
state v @ 14(p’) on the bipartite system H4 @ Hz. In this case, our operation W is 
given as the triple (K, p’, v), which is called a code. The performance of the code 
W is characterized by the following quantities. One is the dimension of K, which is 
denoted by |W. The other is the error 1 — F(¢, v ®@ t,4(p’)), which is denoted by 
E,(W). 

Now, we slightly modify the formulation of visible compression. For a code 

= (K, T, v), we consider another error: 


Epw(W) = >° py (1 — F(Wy, v0 T(W,))), (10.43) 
xEX 
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which has the relation: 


1 = 

op.) < Epw(W) < ep,w(W). (10.44) 

So, even if we replace ¢,, w(W) by ép, w(W), our definition of (10.5) is not changed. 
When p is given as W, defined in (10.41), the code VW = (K, 7, v) for visible 

compression is converted to the code Y = (K, p’, v) for bipartite state generation, 

where p’ is given as follows. 


p= >. pile?) (e241 @ T@). (10.45) 
In this correspondence, 
Ew) = By, (Y). (10.46) 
Hence, we have 
min{ép,w(Y)||W| = M} > minfey, (Y)||W| = M}. (10.47) 
Uv 7 


For the bipartite state generation, we define the following quantity: 


ie . J... 1 >. ~ 
Reg(P) = inf {im — log |W |] Epan(h™) > o} : (10.48) 
{ym} n 
Then, the following theorem holds. 
Theorem 10.5 
, oe eee 
Ro g(p) = E,(p) = lim —E,(p*"). (10.49) 
n-on 


Before proving Theorem 10.4, we show Theorem 10.5. Due to (10.47), Theorem 
10.5 implies the converse type inequality Ry,,(p,W) > E, (Wp). So, after the 
proof of Theorem 10.5, we show the direct part of Theorem 10.4. 

The direct part of Theorem 10.5 essentially is obtained by the following lemma. 


Lemma 10.4 Let « be a one-way LOCC operation from Alice to Bob. There exists 
a code W such that 


&,(W) < 1— F(p, «(|®z)(®;))), (10.50) 
|W] = L-CC(x), (10.51) 


where CC(k) is the size of the classical communication of k. 
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This lemma can be shown as follows. Firstly, Alice prepares the maximally entan- 
gled state |®,)(®z|, and sends a part of the maximally entangled state to Bob via 
noiseless quantum channel. Then, Alice and Bob apply the one-way LOCC operation 
« from Alice to Bob. This operation satisfies the condition for the above code ®. 


Proof of Theorem 10.5 First, we prove the direct part. Using Lemma 10.4, we obtain 
the direct part as follows. Let x, be a one-way LOCC operation satisfying 


. i log CC(kn) 
Him FO, kn(11,)(91,D) = 1, == > 


0, 


lim 
noo 


log Ly ay 
7 < ED "(p)+e 


for any « > 0. Thus, the application of Lemma 10.4 indicates that there exists a 
sequence of codes {W,,} such that 


= ~ ; log || = 
Epon (W,) > 0, lim ——— < E,"(p) +. 
n>oo n 
Therefore, we obtain 
Reg(p) < Ez” (0). 


Next, we prove the converse part. For any € > 0, we choose a sequence of codes 
Win = (Kn, 0), Vn) such that 


er a= 1 Is ~ ted 
R & fim — log |W, | < Reg(p) +, Fpon(Gn) > O. (10.52) 
n 


Then, we have 


2 (a) (b) 
log |W, | = logdimK, > H(Tra p,) => Ep(p),) = Ep(ta @ vn(e;,)), (10.53) 


where (a) and (b) follow from Lemma 8.13 and Condition E2’, respectively. Since 
(10.52) yields that 


F(p®", ta @ Vn(p},)) > 1 
and E,, satisfies Condition E3 (Exercise 8.52), (10.53) implies that 
oe | = . ol ae 
lim — log |W,| = lim —F,(p~"). 
n n>-on 


Hence, using Theorem 8.12, we obtain 
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Reqg(p) 2 E, *(e) = lim 15 (p®") 
84d =— “ec non P : 

a 


Next, to show the direct part of Theorem 10.4, we prepare the following lemma. 


Lemma 10.5 Let « be a one-way LOCC operation. There exists a code V such that 


~ Us 
Ep.w(W) < (1 — F?(Wy, k(|Gz)(®z|))) + 5 |W —K(l®z)(®;)lh, (0.54) 
IW] = L- CC), (10.55) 


where CC(k) is the size of the classical communication of k. 


(Note that any two-way LOCC operation can be simulated by one-way LOCC when 
the initial state is pure [21].) Lemma 10.5 will be shown latter. 


Proof of Theorem 10.4 Using Lemma 10.5, we obtain the direct part as follows. Let 
K, be a one-way LOCC operation satisfying 


= log CC(ky 
lim F(W2", ky((2,)(@2,)) = 1, LEA _, 9, 
noo n 
. logLl, ge 
li < E,"(W,y) +€ 
noo n 


for any € > 0. Thus, the application of this lemma indicates that there exists a 
sequence of codes {W,,} such that 


. log Yl ag 
Ep wo (Wn) > 0, jim, a < E."(W,) +€. 


Therefore, we obtain 
Ryv.q(p, W) < Ez? (Wp). 


| 
Proof of Lemma 10.5 Construction of code V satisfying (10.54) and (10.55) Assume 
that the operation « has the formx = >* ;KA,i @KB,i, where {k 4 j (aa is an instrument 
(a TP-CP-map-valued measure) on 74 and kg; is a TP-CP map on 7, for each i. 
Define the probability ¢, 


qx = Tr \e4)(e4| ® Ip > kai ® Ki ((Px)(PrI) (10.56) 


L 


=o Tre§ (let) (eA) @ Ip((®z)(®_z)), 


L 
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the probability p;,,, and the state p;,, as 


, dep Tr 4 (let) (e% |) ® Ip (Pz) (Pz) 
1x dx 


det Tha k%4 (let) (e4|) @ Ip((®z)(®z]) 


ix = 
qx Pi,x 


Now we construct the coding protocol Y. When the encoder receives the input signal 
x, he sends the state ;,, with the probability p;,, and sends the classical information 
i. The decoder performs the TP-CP map xz; depending on the classical signal 7. 
Then, Inequality (10.55) follows from this construction. Also, Inequality (10.54) 
holds under this construction of YW, as shown below. 


Proof of (10.54) First, we have the following inequality: 


F°(W,, K(I®z) (Pz) 


=F? (x prles)(e4| @ Wr, >) kai 2.0181) 
2m [Zp eA) (e4| @ me |Sen @ pi ({Pz)(Pz|)) 


=k > pe et) (e4| ® 1, [Zn © kg i (lz) (Pz) 


=>) VPs Tre VW. (Tra lef) (e4| ® raps Kai @ KB i(\P1)(®z|)) 


(b) 


< DVPs Tre UW, fe lef Ket] @ In D) kai @ KBiIPL)(®z))) 
SD VP ede Tee We |S) Piet oP) 

= ya px Tea VWs | Pixke.i(Pi.x) 

+ D0 (VPede — Px) The VWe [> Piska.i(Pi.x)s (10.57) 


where (a) follows from a basic inequality F 2(0,0) < Tr Jprva , and (b) follows 
from Exercise 1.26 and Condition @ of Theorem A.1 because 4/t is matrix concave. 
Equation (c) follows from 
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qx >. Pi.xKB,i (Pi,x) 
= >) k,i(Tra (ky (let) (e4 |) ® Ip)|®z) (Pz) 
= >) «ai (Tra(let)(et| @ I)(ka,i @ e)1®z)(®z1)) 


=Tra |e) (e4| @ Ip kau @ kp i(|Pz)(Pz)). 


i 


Then, we have 


Epw(W) =1— >) pe FO/Wy, >) Pixka.i ix) 
<l- > px Trp /Wy >> Pix B,i Oi,x) 


Sa — F2(W,, «|Gz)(®z)) 


+ >) (VP ede — Px) Tre J We | > Pixks,i (0:2). (10.58) 


where (a) follows from (10.57). 
Further, the RHS of the second term of (10.58) is evaluated by 


yw Pxdx Dx) Trg JW, > Di.xKB,i (Pi,x) 


< DOW Ped — Prd = YV#-9 1) Px s as — 4px 


1 s 
=>i@ — Px)4 = 5lla- Ph s iW, K(®z)(®z)hh, (10.59) 


where (f)+ is ¢ when f is positive and 0 otherwise. The final inequality follows from 
the definition of distribution g (10.56). Hence, (10.54) follows from (10.58) and 
(10.59). a 


10.7 Compression with Classical Memory 


In the previous section, we treated visible compression with quantum memory. In 
this section, we consider the compression rate with classical memory. This problem 
was first discussed by Hayden et al. [12]. In this problem, when the state W,. is to 
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be sent to the decoder, the encoder is given by a stochastic transition matrix Q with 
the input system ¥ and the output system {1,..., M@}. The decoder is represented 


by ac-q channel {W/}™, with the output system H. Hence, our code in this problem 


is given by the triplet Y, “ (M, Q, W’), which can be regarded as a code in the 


visible scheme. Then, the optimal compression rate is defined as* 


Lanai 
Ry-(p, W) © inf {i tog 
wi} n 


E pn won (WY) = o| c (10.60) 


Clearly, the inequality 


Ry .c(p, W) = Ry g(Q, W) 


holds. 
Theorem 10.6 [45] 


Ry.c(p, W) = C(W,) = Ce(W,). (10.61) 


Note that the quantities C(W,) and C.(Wp) are defined in Sect. 8.11. 


Similar to the previous section, we can consider the bipartite state generation via 
classical channel. This task can be formulated by restricting the channel to the classi- 
cal channel in the bipartite state generation. That is, an operation W = (K, rho’, v) for 
the bipartite state generation can be regarded as an operation for the bipartite state gen- 
eration via classical channel when p’ = 5°, P;p’ P;, where P; := I, @ |i)({i| and the 
CONS {|i)} spans the space XK. In this case, the state o’ is written as a pe @ li) (iI, 
where M = dimK. So, we write the operation by the triple Ww. = (M, p’,v). 

Similar to the previous section, when p is given as WwW , the code VW. = (M, QO, v) 
for visible compression with classical memory is converted to the code YW. = 
(M, p’, v) for bipartite state generation, where p’ is given as follows. 


M 
p= > OF DY prlet)(ed| @ li) (il. (10.62) 
i=1 x 


In this correspondence, 
Ep.w(We) = by, (Ue). (10.63) 


Hence, we have 


min{&,w(We)||Yel = M} = min{ey (W.)||W.] = M}. (10.64) 
c w, . 


3The subscript c denotes classical memory. 
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For the bipartite state generation via classical channel, we define the following 
quantity: 


' oe Pat ae 
Ree) = inf {in — log |W |] Epon UM) > of : (10.65) 
(Be?} iG 
Then, the following theorem holds. 
Theorem 10.7 
Ry c(p) = C(p) = C-(p). (10.66) 


Before proving Theorem 10.6, we show Theorem 10.7. Due to (10.64), Theorem 
10.7 yields the converse type inequality Ry.-(p,W) > C (W,). So, after the proof 
of Theorem 10.7, we show the direct part of Theorem 10.6. 

The direct part of Theorem 10.7 essentially is obtained by the following lemma. 


Lemma 10.6 Given M states p/ on Ha and M states p? on Hx fori =1,..., M, 
There exists a code Ww. such that 


Z 1 
Ep(W.) < 5 ; (10.67) 


1 
yuh Bhi —P 
i=1 1 


|v.) = M. (10.68) 


This lemma can be shown as follows. Firstly, Alice prepares the random variable 
X subject to the uniform distribution on {1, ..., 4}, and sends it to Bob via noiseless 
classical channel. Then, when the random variable is i, Alice and Bob generate the 
state p;' and p?, respectively. Due to (3.48), this operation satisfies the condition for 
the above operation ®,. 


Proof of Theorem 10.7 First, we prove the direct part. Lemma 10.6 and the definition 
(8.198) of C.(p) yield that 


Rec(p) < Cc(p) 


Next, we prove the converse part. Now, we prove the converse inequality. For any 
€ > 0, we choose a sequence of codes wr = (M), p,, w™) such that 


— 1 T, 

lim — log |W | < Ry c(p) +€, 
n 
M, 

F(> of, OW”, =) > 1, 
i=l 


where p/ = > ie @ |i) (i|. Hence, 
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M, 
[diot, ow, 0% | 0. 
i=l ; 
The definition (8.198) of C.(e) implies that 
7 1 tT, rT. 1 
lim — log pw” = lim — log M, > C.(p), (10.69) 
n n 


which implies that 


Rgc(p) = C.(p). 


a 
Next, to show the direct part of Theorem 10.6, we prepare the following lemma. 


Lemma 10.7 Given M states pe on Ha and M states pp on He fori =1,..., M, 
There exists a code WV, such that 


M 
7 1 or a 
Ep) < |e DF @p?—W,| , (10.70) 
I= 1 


|W. = M. (10.71) 


Lemma 10.7 will be shown latter. 


Proof of Theorem 10.6 Using Lemma 10.7, we obtain the direct part as follows. 
Lemma 10.6 and the definition (8.198) of C.(p) yield that 


Ry.c(p, W) < CW) 


a 
Proof of Lemma 10.7 Construction of code V, satisfying (10.70) and (10.71). Firstly, 
we define the distribution g on ¥: 


M 
def 1 
a = Dy erlot ler). 


i=l 


Next, we define the encoder Q, and the decoder W; as 
W := p?. 


Then, Inequality (10.51) follows from this construction. Also, Inequality (10.50) 
holds under this construction of Y,, as shown below. 
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Proof of (10.70) The recovered state of the above operation is 


~ def 
p= > pxlet) lee O.(i)W;, 


xEX 


M eAlpAleA 
we ler) 
= >> prlet)(e4| @ a7 ee ie 


xEX 


Now, we introduce another state: 
Py, A 
wy def lee lev) 8 
P= > ale) lee tf, 
xEX 


Then, applying the monotonicity of a trace norm to a partial trace and the pinching 
of PVM {|e4) (e4]}, we have 


lq — |, < | ae @ p? — Wp 
1 
Hence, 
|a-wW,| <ie-s'h+[a'- W,] = ta - pl +] - W,] 
Thus, (3.48) implies (10.70). = 


10.8 Compression with Shared Randomness 


Next, we consider the case when the sender and decoder share a common random 
number a priori. In this problem, the encoder is given by a c-q channel Ty with the 
input system 4% and the output system ’, where Ty depends on the common random 
number X. The decoder is represented by a TP-CP map vy from K to H, which 
also depends on the common random number X. Hence, our code in this problem is 


given by the triplet W, = (K, Tx, vy). The error is defined as 


def 
Epw(W,) = Ex >) px (1 — F?(We, vx 0 Tx(x))) « 
xEX 


Further, when the storage is a classical system, the problem is modified as follows. 
That is, the encoder is given by a stochastic transition matrix Qy with the input 
system ¥ and the output system {1,..., M@}, where Qy depends on the common 
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random number. The decoder is represented by a c-q channel {Wy i , With the 
output system #1. The c-q channel Wj, also depends on the common random number. 


bon 8 : def : 
Hence, our code is given by the triplet Y, , = (M, Ox, Wi)! Then, these optimal 
compression rates are defined as 


—— 1 
Ryqr(p, W) © inf {i tog ¥4” 
wi} n 


E pn, won UL) of : (10.72) 


1 
Rv.cr(p, W) © inf {i log (| 
(ui?) , 


Epn, woo (WL) > of ; (10.73) 


Clearly, we have 


Ry.c(p, W) = Ry.er(D, W) = Ry q (DP, W), Ry.q(p, W) = Rv qr; W). 
(10.74) 


Lemma 10.8 


Ryv.qr(p, W) = I (p, W) = Ty, (A: B) = C77 FW). (10.75) 


def bts ' 
Proof Let w™ = (K,, TY”, v¢”) be a sequence of codes achieving the optimal rate 


Ryq,r(p, W). Defining the bipartite state p, ee Yrexn Prlet) (ef |@(Ex vy co (x)), 
we have 


log |W,” | > e( rene) - D>. Pr (Tx?) 


xEx” xexn 


=Ex >) piD(7y"@) } pa pat,” (x)) 


xe 


Ey » Pr ofurP a pivx Ty (x)) 


xex" 
®) 
= ys ri (eve) By ~ prvxTy(x)) 
xexr xexn 


-u( prExvxTy' 1) >) pH ExvxTy? (x) = Ip,(A: B), 


xEX? xex” 


where (a) and (b) follow from the monotonicity (5.36) and joint convexity (5.38) of 
quantum relative entropy, respectively. From the choice of our code we can check 
that F(p,, Wed — 0. Hence, using Fannes inequality (Theorem (5.12), we have 


4The last subscript r denotes shared “randomness.” 
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ee eee eee ee 
lim inf — log|W,"’| > liminf —J,, (A: B) = liminf —Ijen(A: B) = I(p, W). 
n>oo Nn n>o n n>o n P 


In fact, these optimal rates are calculated in the classical case. a 


Theorem 10.8 (Bennett et al. [22], Diir et al. [23]) [fall of states W,. are commuta- 
tive, we have 

Ry.qr(p, W) = Rv.cr(p, W) = I(p, W) = C77? (Wp). (10.76) 
Hence, from (10.74) we have 


Ry.c(p, W) = Rv q(p, W) = Rv qr (P, W) _ Ry cr (DP, W). (10.77) 


Proof From Theorem 10.6 it is sufficient to show the inequality Ry...-(p,W) < 
I(p, W). Now we construct a protocol achieving the rate R = I(p, W) + € for any 
€ > 0 by employing Lemma 10.9 given below. First, we apply Lemma 10.9 to the 
case with Py,y = W”, Py = p", M = e"®, and é = e~"". Then, there exists a code 
wv”) such that 


E pr,wo (UO) ze" +0 pe) wer wm > wet. 
x 


In the above discussion, we take the expectation for x under the distribution x. 


Applying (2.168) to X = log ue oo , we have 
F 


> p" (x) Ww” > entk-r) Wi} <e"(i+s(Px W||p@W,)—R+r) 


x 


eed) a) for vs = 0: 
Hence, 
E ym won (WL) < eo + ers (PWI-R+N | for Ws > 0. (10.78) 
Due to Exercise 10.2, Inequality (10.78) implies that the rate R = I(p, W) + € for 


any € > 0 is achievable. Hence, we obtain (10.76). a 


Lemma 10.9 For any classical source {(Px(x), Pyx(y|x))}, there exists a code 


W., & (M, Ox, O'y) such that 


M 
1 : 
5 LIEx Own? - pola] s8+ 2 PrixGlx) 0.79) 
y i=1 1 Prix Gb) 
‘M” Pyo) = 


for 1 > 6 > 0, where Py(y) pa es Px (x)Pyj\x (yx). 
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Proof of Theorem 10.9 First, the encoder and the Cone prepare the M i.i.d.common 


random numbers Y,,..., Yi subject to Py (y) & = oy. Py|x (y|x)Px (x). When the 
encoder receives the oapinal message x, he sends the signal 7 obeying the distribution 


def Py @lY¥i) def Px (x) 
Pi) = Pxyy (x|Y1)+...+Pxyy (x|Yu)? where Pry @ly) — Pyix(y|x) Py(y)* The receiver 


recovers y = Y; when he receives i. 
In this protocol, when the original message is x, the probability of i = 1 and 
y = Yj is given as 


Py (y)Px\y (aly) 
oe y 
me Pxry (x|y) + Pxyy (|Yo) +... + Pxyy |Yu) 


Hence, the recovered signal is equal to y with the probability 


MPy(y)Pxiy ly) 
sas y, . 
"Pxiy (x|y) + Pxyy (x[Yo) +... + Pxyy@l¥u) 


Thus, since 1 is concave, 


MPy(y)Pxiy ly) | 


> /P (lx) —E 
= y|xX(Y|X) — By,,..., ¥ 
2 7 me By (aly) + jo Pxyy OLY) 


MPy(y)Pxiy (xy) 
= P x)-E 
I yx (y|x) Vises See ea a oer 
(a) MPy(y)Pxiy (ly) 
=O ne) 


dl (PrixOl) _ 1 
M \ Pr(y) 


(b) 
= P 
, vix (lx) 1+ 2 (Ape — 1) . 


M Py(y) 
(c) 1 (PyixQlx) 
< » Pyix (yx) (5; (Auce —1 
’ Py|x Wx) M Py (y) 
y0s (AREY <1 
+ » Pyix(y|x) 
yf CE 1 
<6 + > Pyix(y|x) 
yg ee l>s 
<6+ D0 PyxGlx), 
vg 


where (a) and (b) follow from Jensen inequality and Exercise 10.6, respectively and 


PyixQlx) 1 
£ PO — 1) > —2 > =. = 


(c) follows from Exercise 10.5 and 
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Exercises 
10.5 Prove the inequality = < min{x, 1} for any real number x > 0. 


10.6 Show that 


(Ge _—] 
Py(y) 


1 
MPy(y)Pxiy (ly) M 
1 f PyixOlx) ; 
+ iM ( PyrO) 1) 


Pxy(x|y) + (M — 1)Px (x) 


Pyix(y|x) ae 


10.7 Prove (10.40). 


10.8 Consider the bipartite state generation via channel with shared randomness. 
Our code is given by the triplet W, = (K, py, vx), where p, is the generated state 
dependently of the common randomness X and vx is the decoder dependently of the 
common randomness X. When the target bipartite state is 9, we surpass the condition 
Trx Oy = Trg p for any value of the common randomness X. Then, we can define 
the size |W,| and the error Ep (W,.) for the target bipartite state p in the same way. 
Then, these optimal compression rates are defined as 


=-—.,] is 
Reuq,r (p) = inf | lim — log pw | 
(O?} n 


E pan (Wi) > o} : (10.80) 


Show that 


Regr(p) = Ip(A: B). (10.81) 


10.9 Relation to Channel Capacities 


In the previous section, we have discussed the simulation of c-q channel by the pair 
of classical noiseless memory and shared randomness. This section discusses the 
relation between this simulation problem and c-q channel coding. 

In the above discussion, we consider the average error under the prior distribution 
on the input system. Sometimes it is suitable to treat the worst error with respect to 
the input signal as follows: 


ef 1 
Ce (Ww) © inf {tog 
we} n 


Ew (WM) > of (10.82) 
Ew(Wey) & max 1 — F(We, Ex(W4)i(Ox)})- 


This problem is called reverse Shannon theorem, and is closely related to c-q channel 
coding. To discuss this relation, we focus on the c-q channel capacity with shared 
randomness: 
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def . — | (n) 
C.r(W) = inf {lim — log|®,y’| 
{oy} tt 


Exewo[G%?] > o| ; (10.83) 


where ®x is a c-q channel code randomly chosen by the random number X shared 
by the sender and the receiver, and is written as the triplet (M, gx, Yx). Here, we 
assume that the size of codes ®y does not depend on the shared random number. 

The difference of C,,.(W) from the conventional c-q channel capacity C,.(W) is 
allowing the use of the shared randomness X. Hence, we have 


Co(W) < Co,r(W). (10.84) 


However, we can show the equation because for any code ®y with shared random- 
ness, there exists a code ® such that ew[®] < ew[Px]. 


C.(W) = Co,(W). (10.85) 


Since the two capacities C,,.(W) and C — (W) allow use of the shared randomness 
X, the relation is similar to that between entanglement distillation and dilution, in 
which, dilution corresponds to C RW), distillation corresponds to C,,.(W), and 
maximally entangled states correspond to noiseless channels. Hence, we can show 


Cor(W) < CE(W). 


The first inequality above follows from the comparison between the definitions of 
C.(W) and C,.,(W). As shown in Theorem 4.1, we have 


max I(p,W) = C.(W). 
p 


Further, the following theorem holds. 


Theorem 10.9 (Bennett et al. [22]) When all of states W, are commutative, 


C.(W) = C.,,(W) = C%.(W) = max I(p, W). 
p 


Proof It is sufficient to show ck.(W) < max,/(p, W) for this theorem. This 
inequality follows from the proof of Theorem 10.8 with the distribution p = 
argmax,,/(p, W). a 


Further, we can define C,,.(W) and C le (W) by replacing the shared randomness by 
the shared entanglement.° Since the shared randomness can be generated from the 
shared entanglement, 


>The last subscript e denotes the shared “entanglement” while the superscript e denotes entangled 
input. 
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Cor(W) < Coe(W) < CRW) = CE(W). (10.86) 
Hence, when all of states W, are commutative, 


C(W) = C.,.(W) = Cee(W) = C2,(W) = C2.(W) = max I(p, W). 
Pp 


When W is replaced by a q-q channel x, we can consider the simulation of the 
output states of entangled inputs. Considering this requirement, we can define the 
capacity C sed («) as the reverse capacity of Cf ,(«) [22, 24]. This capacity can be 
regarded as the capacity of teleportation through a noisy channel x. Similarly, we have 


Ceol) <= Cez(k). 


Recall our treatment of CZ , («) in Sect. 9.3. Originally, the reverse capacities C un (W) 
and Cé ®(«) were introduced for proving the converse part of C <.e(k) by Bennett 
et al. [22]. They proved the equation Cé , Coe = max, I(p, oa ) for the generalized 


Pauli channel ic by showing the two inequalities 


GP GP 
Cee )= max I(p, ky), 


COR (pr) < max I(p,«>”) = logd — H(p), (10.87) 
p 


where d is the dimension of the system. They also conjectured [24] 


Ce (kK) = CoR(K) = max /1(p,k). 
: : p 


In addition, when W is a q-c channel, i.e., a POVM, this problem was solved as the 
compression of POVM by Winter [25] and Massar and Winter [26]. 

In the same way, we can define C€ ,.(«) and Le («). Then, in the same way as 
(10.85) and (10.86), we can show that 


Cea Mia C te) Scirus = Cr): (10.88) 


However, C, ae («) is infinity when « is not entangled-breaking because this capacity 
requires the simulation of quantum channel by the pair of classical memory and 
classical shared randomness. 

Moreover, replacing the classical noiseless channel by the quantum noiseless 
channel, we can define the capacities Cy.¢(«), Cg, («), Cre, and CF, (x). Here, 
we measure the quality of approximation by using the entanglement fidelity. Then, 
in the same way as (10.88), the relations 


Cy ale) = Cy r(k) S Cy ele) = CP («) < CR) 


hold. 
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Proof of (10.87) Here, we give a proof for inequality (10.87). Assume that the 
sender and the receiver share the maximally entangled state |®,)(®,| on the tensor 
product ie ® Hc. When the sender performs the generalized Bell measurement 
{|u}'s ee halt j) on the composite system between the input system 7/4 and the 
sender” S fecal system 7/c, he obtains the data (i, j) subject to the uniform distribution 
Pmix,a2- In this case, the generalized Pauli channel ai can be written as 


K(p) =>) > pW - i,’ — PX5Z5 


GA) @I9 

Trac & [uj s) (up|) ((®a) (Pal ® p)X ZS 
eee 2 pi’ —i, j’ — J) is simulated with the 
shared randomness, the generalized Pauli channel ieee can be simulated with the 


shared entanglement. Since CO y= = logd — H (x). we have (10.87). i 


Hence, if the classical channel Q 


10.10 Proof of Lemma 10.3 


We first prove the following lemma. 


Lemma 10.10 A visible encoder may be represented by a map from X to S(K). 
Consider the convex combination of codes T and T': 


OF+USO7TI = 1TOEOS DIO, VeVi ei 


Then, the set of visible encoders is a convex set, and the set of extremal points (see 
Sect. A.4 for the definition of an extremal point) is equal to 


{T |T (x) is a pure state Vx € V}. (10.89) 


Proof When T (x) is a pure state for every input x, T is therefore an extremal point 
because it is impossible to represent the encoder T as a convex combination of other 
encoders. Hence, to complete the proof, it is sufficient to show that an arbitrary 
visible encoder T(x) = ba 5;,10;,)(@j,| can be represented as a convex com- 
bination of encoders satisfying the condition in (10.89). Define a visible encoder 


Th, J whadu) by 
T (fi, Jar +++s Jnl) = 10;,) (Pi I- 


Then, this encoder belongs to the set (10.89). Since T = Doi j,i, SAS Sin 
T(j; j2,°°* » Jn)» the proof is completed. | 
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We also require the following lemma for the proof of Lemma 10.3. This lemma is 
equivalent to Theorem 8.3, which was shown from the viewpoint of entanglement in 
Sect. 8.4. 


Lemma 10.11 Let p € S(H,4 ® Hp) be separable. Then, 


max{Tr Pe,|P : Projectionin 7, with rank k} 
>max{Tr Po|P : ProjectioninH, ® H, with rank k} 


holds for any integer k. 


Proof of Lemma 10.3 According to Lemma 10.10, it is sufficient to show (10.28) for 
a visible encoder T in (10.89). From Condition © in Theorem 5.1, there exist a space 
H’ with the same dimension of H, a pure state po in H’ @ H, and a unitary matrix 
UinK @H’ @H such that v(o) = Trew U(p @ po) U*, and the state 


det (Wy ® 1) U (T(x) ® po) U* (Wx @D) 
* TrU (T(x) ® po) U* (Wy @ 1) 


ES(KKQH@H) 


is a pure state. Since UT (x) ® poU* is a pure state and (W, @ /) is a projection, we 
have 


Tr v(T (x))W, =TrUT (x) @ po U* (W, ® I)=TrU (T(x) @ po) U*p,. (10.90) 


Since Tric.7y Px = Wy, we may write p, = W, @o, by choosing an appropriate pure 


state o, € S(K @H’). Hence, the state p, = Diex PO) Ox = Direx P(x) Wx @ ox 
is separable and satisfies W, = Trz,x’ Pp. Since Ix => T(x), we have U (Ux ® po) 
U* > U (T(x) ® po) U*. Thus, from (10.90) we have 


> P@) Tr v(T@)) We = D> p(x) Tre Treen U (T(x) ® po) U" px 


xEX xEX 
< >) p(x) TrU (Uc ® 0) U* py = Tr U (Ix ® po) U* pp. (10.91) 
xEX 


According to J > U (Ic ® po) U* > Oand Tr U (Ik ® po) U* = TrIk = dimK, 
we obtain 


rank P = dimK 
<max{Tr PW,|P : Projection in 1, rank P = dim K}. (10.92) 


Tr U (ke ® po) U* pp < max {r: Ppp 


P : ProjectioninK @H@H’, | 


(10.92) may be obtained from Lemma 10.11 and the separability of p,. The projection 
P on H satisfies 


Tr(W, — a)P < Tr(W, — a){W, —a = O}. 
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If the rank of P is dimK (i.e., if Tr P = dim K), then 

TrW,P <adimK + TrW,{W, — a = O}. (10.93) 
From (10.91)-(10.93), 


1—e(W) = DS) px) Trv(T@)) Wx 


xEX 
<max{Tr PW,|P : Projectionin H, rank P = dim K} 
<adimK + Tr W,{W, — a => O}. 


We therefore obtain (10.28). | 


10.11 Historical Note 


First, we briefly treat the pure-state case. The source coding problem in the quantum 
case was initiated by Schumacher [3]. In his paper, he formulated the blind scheme 
and derived the direct part and the strong converse part assuming only unitary coding. 
Jozsa and Schumacher [4] improved this discussion. Barnum et al. [27] introduced 
the purification scheme and proved the strong converse part without assuming unitary 
coding. Further, Horodecki [19] introduced the visible scheme as an arbitrary coding 
scheme and showed the weak converse part. Further, Barnum et al. [28] pointed out 
that the previous proof by Barnum et al. [27] could be used as the proof of the strong 
converse part even in the visible scheme. In this book, Lemma 10.3 plays a central 
role in the proof of the strong converse part. This lemma was proved by Hayashi [13]. 
Winter [29] also proved the strong converse part using a related lemma. Using this 
formula, Hayashi [13] derived the optimal rate with an exponential error constraint. 
When the probability of the information source is unknown, we cannot use the coding 
protocol based on the prior distribution p. Using the type method, Jozsa et al. [17] 
constructed a fixed-length universal code achieving the optimal rate. In addition, in 
the classical case, Han [14] showed that compressed states that achieve the minimum 
rate are almost uniformly random in the fixed-length scheme. In this book, a part of 
the quantum extension of the above Han’s result is proved as Theorem 10.3. 

In the variable-length scheme, the problem is not so easy. In the classical case, we 
can compress classical data without any loss. However, Koashi and Imoto [6] proved 
that if all information sources cannot be diagonalized simultaneously, compression 
without any loss is impossible. Of course, using Schumacher’s [3] compression, we 
can compress quantum information sources with a small error. Further, using Jozsa 
et al.’s [17] compression, we can compress quantum information sources with a 
small error based only on the knowledge of the entropy rate H(W,,) if the informa- 
tion is generated by an independent and identical distribution of the distribution p. 
Hence, universal variable-length compression with a small error is possible if we can 
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estimate the entropy rate H(W,,) with a negligible state reduction. For this estima- 
tion, the estimation method in Sect. 7.4 can be applied. Using this idea, a variable- 
length universal compression theorem is constructed in this book. This construction 
is slightly different from the original construction by Hayashi and Matsumoto [9]. 
The modified construction by Hayashi and Matsumoto [10] is closer to the con- 
struction of this book. Further, Hayashi and Matsumoto [9] showed that the average 
error of variable-length compression does not approach 0 exponentially in the two- 
level system when the compression scheme has group covariance and achieves the 
entropy rate H(W,). Jozsa and Presnell [30] applied this idea to the Lempel—Ziv 
method. Bennett et al. [31] considered the complexity of universal variable-length 
compression. Hayashi [32] proposed another formulation of variable-length universal 
compression, in which there is no state reduction. This formulation cannot decide the 
coding length to avoid state reduction. Hence, Hayashi [32] considered the average 
coding length and Kraft inequality. 

In the analysis presented in this book, we have only considered probability dis- 
tributions that satisfy the independent and identical condition for the source. Petz 
and Mosonyi [33] showed that the optimal compression rate is limy-, 90 aon) when 
the information source p, is stationary. Bjelakovi¢ and Szkota [34] extended this 
result to the ergodic case. Datta and Suhov [35] treated nonstationary quantum spin 
systems. Further, Bjelakovié et al. [36] extended Bjelakovi¢ and Szkota’s result to 
the quantum lattice system. Nagaoka and Hayashi [37] derived the optimal compres- 
sion rate without any assumption of the information source based on the quantum 
information spectrum method. Using Lemma 10.3, they reduced quantum source 
coding to quantum hypothesis testing. Indeed, it is expected that the above results 
will be derived based on the asymptotic general formula by Nagaoka and Hayashi 
[37]. Kaltchenko and Yang [38] showed that this optimal rate can be attained by 
fixed-length source coding in the ergodic case. 

The mixed-state case was firstly discussed by Jozsa’s talk [39]. For this case, 
Horodecki derived the lower bound J(p, W) (10.40) [19] and derived the optimal 
rate (10.42) [20] in the visible case. However, our optimal rate (10.41) has a slightly 
different form. Koashi and Imoto also derived the optimal rate in the blind case 
(10.39). 

When the memory is classical, Bennett and Winter [40] pointed out that the com- 
pression problem with commutative mixed states is essentially equivalent to Wyner’s 
[41] problem (Theorem 8.13). Theorem 10.6 can be regarded as its quantum exten- 
sion. Further, Hayden et al. [12] treated the tradeoff between the sizes of classical 
memory and quantum memory with the visible scheme in the pure-state case. 

Next, let us proceed to compression with shared randomness. Bennett et al. [22] 
introduced a reverse Shannon theorem (Theorem 10.9) and proved Theorem 10.8 as 
its corollary. Diir et al. [23] also proved Theorem 10.8 independently. In this book, 
we prove it via Lemma 10.9. Since this lemma has a general form, it can be extended 
to a general sequence of channels. 

Further, we can consider the tradeoff between the sizes of the classical noiseless 
channel and the shared randomness as an intermediate problem between Ry,.,-(p, W) 
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and Ry.-(p, W). Bennett and Winter [40] treated this problem in the commutative 
case. 

In the classical case, Slepian and Wolf [42] considered the compression problem 
when the information source lies in the composite system and has a correlation. 
In their problem, the encoder in each system is divided into two players, who can 
only perform local operations. However, the decoder is allowed to use both encoded 
information. Devetak and Winter [11] treated the quantum extension of this problem 
in the special case with an ensemble scheme. Ahn et al. [43] treated a more general 
case with the ensemble scheme. Further, Abeyesinghe et al. [44] treated this problem 
with a purification scheme. When there is only one encoder and the other system can 
be accessed by the decoder, the problem is called the source coding with the side 
information. This problem is slightly easier than Slepian and Wolf coding while it is 
often confused with Slepian and Wolf coding. When the side information is quantum 
and the information to be sent is classical, this problem has been discussed in detail 
in the paper [46]. 


10.12 Solutions of Exercises 


Exercise 10.1. Since 2(1— F(W, vn (T(x))) = 1— FW, vn (T(x)? = 1-F 
(WL, vn(T (x))), we have 2(1 — Diyean DEF (WL”, va(T (X)))) = 1 - Dean Po 
F(WL?, vn(T(x)))? > 1 = Deyn pr (WL, vn(T(x))). Hence, the condition 
reve DEF (WL, v,(T(x))) > Lis equivalent to the condition S.-yn p2 F(W.”, 
Va(T (x)))* > 1. 


Exercise 10.2 Due to (3.48) and (3.52), we have 


2d) (Wx, v0 T(W,)) > 201 — F(W,, vo t(W;))) 
>1— F? (Wy, v0 t(Wz)) > di (Wy, v0 T(W,)). 


2 


Taking expectation for x and applying Jensen inequality to a +> a“, we have 


28 pw) = ep.wlh) = Ep, wi). 


Hence, the condition €, w(w) — 0 is equivalent with the condition e, wy) — 0. 
Hence, the optimal rate Rg.,(p, W) does not changed by this replacement. We can 
show the same thing for the other optimal rate Ry,,(p, W). 


Exercise 10.3 Equation (10.25) implies that Rp.,(o) < H(p). 
Next, for a given p, we choose p and W such that all of W, are pure and p = W,. 
Then, the second inequality of (10.17) and Theorem 10.1 guarantee that Rs a (p) = 


Rp g(P. W) = A(p). 
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Exercise 10.4 Equation (10.34) can be shown as follows: 


(a) 
dim7Y,(R,B)< > = yet) 


qeT":H(q)sR qeT":H(q)sR 


@) b) 
< » ek <(n+ le zie 


qeT":H(q)<sR 


where (a) and (b) follow from (2.155) and (2.154), respectively. 
(10.35) can be shown as follows: 


(a) 
Tr -ParaWe"< > Pays > eral 


geT":H(q)<R qe€T":H(q)<R 


<2) expr inf Dialin)) 


qeT":H(q)<sR 


() 
<(n+1)‘exp(—n inf — D(qllr)), 
q:H(q)>R 


where (a) and (b) follow from (2.156) and (2.154) respectively. 


Exercise 10.5 When 0 < x < 1, we have x + 1 > 1. Hence, we have cee <x= 


min{x, 1}. When 1 < x, we have 0 < x < 1 +x, which implies ie 1. 


Px@) 


Exercise 10.6 Since Px),(x|y) = PO) 


Py;x (|x), we have 


MPy(y)Px\y (ly) 
Pxiy(aly) + (M — DPx(x) 
pps MP x (x)Pyix(ylx) 

Be Prix (ylx) + (M — 1)Px(x) 


MPy\x(y|x) 
=Py\x (|x) Pax) 4 (yy — 1) 
Py(y) 


Prix (la) (EO + lt — 1) - MPvix(v1s) 


Pyix(Qy|x) 
Py(y) +(M—)) 


Pyix(y|x) 


Prix Qa) 1 ( PrixQly) 
Prix) (“aS a 1) ere. M ( Pr) -1) 
Pyix Ola) eee Pyix (lx) 
BO + (Mm — 1) ite (eer =) 


Exercise 10.7 Equation (10.40) follows from the combination of Lemma 10.8 and 
the second inequality of (10.74). 


Exercise 10.8 Let b © (K,, p, v¥?) be a sequence of codes achieving the 
optimal rate Ry 4,-(e). We have 
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log |W | > Bxlm,(A : K) = ExD(p™ y|| Tre py ® Tra py) 
=ExD(p y|| Tre 2” ® Tra py) 
(a) 
>Ey D(vx (py) || Tre 02" @ vy (Tra py) 
ODE (n)! @n (n)! 
> xVx(p'” y)|| Trg op" @ Exvy (Tra p“"’y)) 
=I, (A: B), 


Exvx (py 


where (a) and (b) follow from the monotonicity (5.36) and joint convexity (5.38) of 
quantum relative entropy, respectively. From the choice of our code we can check 
that F(Exvy (0), p®") — 0. Hence, using Fannes inequality (Theorem (5.12)), 


we have 
ceo Sei ie pee rece 
lim inf 7 log |W," | = lim inf 7 TEx vx (p04) (A : B) 
1 
=liminf —I,ex(A: B)=I1,(A: B). 
n>coo n 
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Appendix 
Limits and Linear Algebra 


A.1_ Limits 


In this text, frequently we discuss the asymptotic behaviors in several problems when 
the number n of prepared systems is sufficiently large. In this situation, we often 
take the limit n — oo. In this section, we give a brief summary of the fundamental 
properties of limits. Given a general sequence {a,}, the limit lim,_..5 a, does not 
necessarily exist. For example, a sequence a, is acounterexample when a,, diverges to 
+o0 or —oo. Insuchacase, itis possible to at least denote these limits as limy—. 40 dy = 
+00 or limy-.o0 d, = —0o. However, the sequence a, has no limit as n — oo, even 
allowing possibilities such as +00 or —oo, when a,, is defined to be 0 when n is 
even and | when it is odd. This is caused by its oscillatory behavior. In this case, we 
can consider the upper limit lima, and the lower limit lima,, which are given as 
lim a, = 0 and lim a, = 1. More precisely, lim a, and lim a, are defined as follows: 


lima, © sup{a\ve > 0,3N, Vn > Nia <a, +4), 

lima, = inf{a\Ve > 0,3N, Wn > N,a > dn — 4}. 
When lima, = lima,, the limit lim,_, 4. a) exists and is equal to lima, = lim ay. 
The following three lemmas hold concerning limits. 


Lemma A.1 Let sequences {a,}°° , and {b,}°_, satisfy 


an 2 by 
Qn + Gm S Antm + Dntm, sup— <oO, lim — = 0. 
n on n>-o n 
Then, the limit limy-so a exists and satisfies 
. a ——a a 
lim — = lim — = sup —. (A.1) 
n>o n n nn 
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Tf an + am = An-m — by+m and inf, & > 00, by — 0, then similarly limy oo & = 
lim © = inf, “, as shown by considering Ay. 


Proof Fix the integer m. Then, for any integer n, there uniquely exist integers /,, and 
rn such that 0 <r, <m—1andn=1,m +r, for each n. Thus, we have 


an -_ Qi,m+r i,m ay — by > In Qin +4 ay — by = bi,m 
n Lmt+r7imt+r im+r7im+r l,m +r 


Sincel, — coasn — on, taking the limitn — oo, we have lim “3 > ou for arbitrary 
m. Next, taking the limit m — oo, we have lim “ > sup,, “ > lim. “. Since 
lim & > lim “, we obtain (A. 1). | 


Lemma A.2 Let {a,} and {b,} be two sequences of positive real numbers. Then, 
4 | 4 
lim — log(a, + b,) = max j lim — loga,, lim — log b, 
n n n 
Proof Since (a, + by) => an, by and 
1 1 1 
lim — 7 OB (an +b,) > lim — ee d,, im — ae bn, 
we obtain the > part of the proof. Since 2 max {a,, Dy} => (a, + bn), we have 
1 1 1 
max | lim — 7, 08 dn, im — , 8 b, } = lim — 7, 88 max {d,, b,} 
oo | 
= lim — log 2 max {a,, b,} > lim — log(a, + bn), 
n n 


which gives the reverse inequality. This completes the proof. a 


Lemma A.3 Let {f,(x)} be a sequence of functions such that f,(x) < frQy) if 
x > y,and f,(x) > Oifx > 0. There exists a sequence {€,} of positive real numbers 
converging to zero such that f,(x) — 0. 


Proof Let N bea aa integer. Choose positive integers n(N : such thatn(N) < 


def 


n(N + 1) and AG )< x for n > n(N). We also ie &n = = for n(N) <n < 
n(N + 1). Then, €, > 0. Ifn > n(N), then f, (en) < x: Therefore, Fn(€n) > 0. 


a 
For any two continuous functions f and g on an open subset X C R?, we define 


def ‘ 
Lf, gl@) © min (f@)ig@) <a). (A.2) 
Lemma A.4 When X is closed and bounded, i.e., compact, 


Lf gl(a@) = Ales gl(a+e). (A.3) 
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Proof From the definition for « > 0, Lf, gl(a) = Lf, gl(a + ©). Hence, [f, g](a) = 
lim.jolf, g](a + €). From the compactness, for any €; > O there exists €2 > 0 such 
that ||x — x'|| < 6 => | f(x) — f(’)| < e;. Further, from the compactness of X we 
can choose a small number ¢€3 > 0 such that {x|g(x) < a + 63} C Uy-e(xy<aU x", 
Hence, 


min x) > min x) > min x)— €, A4 
xlgQ)sates We XU g(x) cay! ,ep £0) 2 xlg(x)<a FO) oa) 
which implies (A.3). a 


A.2 Singular Value Decomposition and Polar 
Decomposition 


Any d x d’ complex-valued matrix X has the form 
X = U,X'U; (A.5) 


with isometric matrices U, and U2 and a diagonal matrix X’. This is called a singular 
value decomposition (the matrix U is an isometric matrix if U*U is the identity 
matrix; U is a partially isometric matrix for the partial space XK if it is a projection 
onto the partial space KC). Choosing ad x d’ partially isometric matrix U in the range 
{X*Xv|v € C4} of X*X, we have 


X=U|X|, |X| % VK*xX, (A.6) 


which is called a polar decomposition. If X is Hermitian and is diagonalizable 
according to X = >°, A;|u;)(u;|, then |X| = 5°; |A;||u:) (ui|. Since X* = |X|U*, 


XX* =U|X||X|U* = UX*XU*, VXX* = UV X*XU", (A.7) 

UX*U =X. (A.8) 
Therefore, 

X=VXX*U. (A.9) 


If X is a square matrix (i.e., d = d’), then U is unitary. If d > d’, then U can be 
chosen as an isometric. If d < d’, U can be chosen such that U* is isometric. We 
now show that these two decompositions exist. 

Since X*X is Hermitian, we may choose a set of mutually orthogonal vectors 
u,,..., U; of norm | such that 


1 
X*X = > Ailui) (wil. 
=I 
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In the above, we choose Ddie such that A; > A;41 > 0. Hence, / is not necessar- 
ily equal to the dimension of the space because there may exist zero eigenvalues. 


. def 
Defining v; = ,/ +Xui, we have 


i 44 { 1. 
(uj|vj) = Y x =4) 5, — X|uj;) 
1 1 
VAY Aj 


Furthermore, from the relation 


1 1 
Mel PTH) gl ee ey) ig Oe 


1 
= | —(ui|X*X|uj) = Vidi; 


t 


we can show that 


DY VAilvi) uel = >" vi) (VAX D> uj) (uy = X. (A.10) 
i j 


i 


One may be concerned about the validity of the second equality if X*X has some 

eigenvectors u with zero eigenvalue. However, since (u|X*X|u) = 0, we have Xu = 

0. Hence, the image of vector u is the zero vector in both sides of (A.10). We 

define U> = (uw!) and U; = (v!), which are d x / and d’ x | isometric matrices, 

respectively. Let X’ be an/ x / diagonal matrix (./; 6;,;). This gives us (A.5). 
Using the above, we obtain the following lemma. 


Lemma A.5 Let a density matrix p be written as 
a 
p= > lvy)(vjl, (A.11) 
j=l 


where {v;} is a set of vectors that are not necessarily orthogonal. Let its diagonaliza- 
tion be given by p= yy Aj; |u;) (u;|. Since A; > 0, 1 is not necessarily equal to the 
dimension of the space. Then, the vector v; can be written as vj = Ys wi iv Ati 
by using anl x d' isometric matrix W = (wj,i) [1]. 


The set of vectors {u;} satisfying (A.11) is called the decomposition of the density 
matrix p. 


Proof Let Y bead’ x / matrix given by (v!). Then, 
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1 
p= > Ailui)(uil = YY". 


i=1 


Define w; = fiYru. Then, Y* = 4 J Xi | wi) (uj |. Taking its conjugate, we 
obtain Y = yy VXilui) (wil. Looking at the jth row, we obtain |v;) = yw!) 
/Xj|ui). Since > (w7)*(w7) = 6;,, w} is an isometric matrix. The proof is com- 
plete. a 


Next, we consider the case where X is areal d x d matrix. Since a real symmetric 
matrix can be diagonalized by an orthogonal matrix, the unitary matrices U; and U2 
may be replaced by orthogonal matrices O; and Op. In fact, we may further restrict 
the orthogonal matrices to orthogonal matrices with determinant | (these are called 
special orthogonal matrices). However, the following problem occurs. Assume that 
the determinant of O; (i = 1, 2) is —1. Then, O; may be redefined by multiplying it 
by a diagonal matrix with diagonal elements —1,1,..., 1. The redefined matrix is 
then a special orthogonal matrix, and OF X O> is diagonal. Choosing O; and O2 ina 
suitable way, all the diagonal elements of Of X O2 will be positive if det X > 0. On 
the other hand, if det X < 0, then it is not possible to make all the diagonal elements 
of O;} X O> positive for special orthogonal matrices O;, Oo. 


Exercises 
A.1 Define Jj, as (u;|u;) for a set of linearly independent vectors uw, ..., uz in H. 
Show that 
DT wa) ul = SOC) aa) ey (A.12) 
ij i,j 
Show that this is a projection to the subspace of 1H spanned by w),..., Ux. 


A.2 Using relation (A.6), show that 


AA* f (AA*) = Af (A*A)A*. (A.13) 


A.3 Norms of Matrices 


We often focus on the norm of the difference between two matrices as a measure of 
the difference between them. There are two types of norms, the matrix norm and the 
trace norm. The matrix norm ||A|| of a matrix A is defined as 


def 
|All = Heras || Axl. (A.14) 


Since ||x|| = maxyyy=1 |(y, x)|, we have || A|] = max) y)=y.j=1 |(y, Ax)|; therefore, 
|| A|| = || A*||. From the definition we have 
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||U; AU|| = |All (A.15) 


for unitary matrices U; and U>. Defining 


w(A) = = [(x, Ax)|, spr(A) < max{|A| : \ is the eigenvalue of A}, 


we obtain 
spr(A) < w(A) < ||Al. (A.16) 


Assume that A is a Hermitian matrix. Then, it may be diagonalized as A = 
pa A; |u;) (u;|. Thus, 


d d 


I(y, Ax)] = Do [Ail lylus)(uilx)| < max |A;| SS Mv lua) Ih ual) 


i=1 i=1 


d d 
<max |i] | >) I(ylea)P? |S) |uilx)[? = max [Aj] = spr(A). 
i=1 i=1 
The above inequality implies the equality sign in (A.16). Since ||A||* = max jx |j=1 
(x|A*A|x) = spr(A*A) = (spr(VA*A))’, then 
|All = |v A*A]] = |]A*l] = IV AA* I. (A.17) 
On the other hand, the trace norm || X'||; of a matrix X is defined as 


|X|]; = max TrUX. (A.18) 
u-unitary 


Choosing a unitary matrix Uy such that X = Ux|X| (i.e., a polar decomposition), 
we obtain’**: 48 


|X|}i = max TrUX = TrUYX = Tr|X|. (A.19) 
u:unitary 


Hence, we also have 


|X* |], = max TrU*X* = TrUy X* = Tr|X"*|. 
uU:unitary 


Further, we often focus 2-norm: 


|X |lo = VTrXX*. (A.20) 
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We have the relation 
|X |lo < Xk. (A.21) 
Then, we can show that 
IY Xi < YUN (A.22) 


fori = 1,2. 
Further, as generalizations of the norms || X ||; and || X'||2, for a real number p, we 
define p-norm. For a function f(x), the p-norm || f||, is defined as 


l/p 
Ifllp = (x ro") (A.23) 


For a square matrix X, the p-norm ||X'||, is defined as 
[Xp = Tr] X|?)"/”. (A.24) 


When p,q > 0 satisfy 5 + 7 = 1, the Hélder inequality 
| fs] < Ifllolisla (A.25) 


holds for two functions f and g. The equality holds if and only if there is a constant 
c such that f(x)? = cg(x)?. Then, the matrix Hélder inequality [2, Theorem 6.21] 


TrXY| < |X|lpllV ll (A.26) 
holds for two matrices X and Y. Since |TrXY| < Tr|X||Y|, it is enough to show 
(A.26) for positive semidefinite matrices X and Y. The inequality (A.26) in this 
case can be shown from (6.122) of Proof of (6.17) in Sect. 6.7 when H, is a one- 
dimensional space, \ = 1/p, p48 = Y4, and 04-8 = X?. 


When 0 < p < Landgq < O satisfy 7 + 7 = | we can show the reverse Holder 
inequality [3] 


| fg] = Ifllolisla (A.27) 
for two positive-valued functions f and g, and the reverse matrix Hélder inequality 


[TrXY| > ||X|lpIl¥llq (A.28) 


for two positive semidefinite matrices X and Y. 
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Proof of (A.28) Since (A.27) can be regarded the diagonal case of (A.28), we show 
only (A.28). It is sufficient to show the case when X and Y are invertible. The non- 
invertible case can be obtained by the limit of the above case. We choose the real 
number s := = and the two matrices A := log X and B := log Y. Then, we apply 


the matrix Hélder inequality (A.26) to the matrices (e4+#) rs and (e4)~ 1 with 
p! := 5 and q’ := 1+. We obtain 


AY 


er yMs 7 (ete) 


1 5 
A+B |) Tes 1) 5-4 || Ts 
errr ler = INC 


A+B A A+B A a B B 
>|lehs eT ts ll > Tre™ et > Tret = lets ll1, 


where a follows from Golden-Thomson trace inequality (5.48). Therefore, we have 
a ee 
(err S lens Ili llews ||, ‘*. Since Golden-Thomson trace inequality (5.48) 


. BL ae ees ; ‘ . 
yields ||e4e? ||; > |le4*#|I1, we have |le4e? I], > lle™ [I] |le~* ||’, which implies 
(A.28). | 


Exercises 


wk ; —ab\. 
A.3 Show that the trace norm of a Hermitian matrix ( - i) is equal to 


b 
2/|b|? + a?. 


A.4 Show that 
|X] = WTreX|h (A.29) 


for a matrix X in H, ® Hep. 


A.5 Let A and B be square matrices of dimension d. Show that the eigenvalues of 
BA are the same as the eigenvalues of A B including degeneracies if A or B possesses 
the inverse. 


A.6 Show that spr(AB) = spr(BA). 


A.7 Show that the function tf +> 1!'/? is a matrix monotone function following the 
steps below. 

(a) Show that || Al/ 2B-l | < 1 when the Hermitian matrices B and A satisfy B > 
A > 0 and B possesses the inverse. 

(b) Show that 1 < spr(B~!/4A'/? B-!/4) under the same conditions as (a). 

(c) Show that 


Bes Av (A.30) 


under the same conditions as (a). 
(d) Show that (A.30) holds even if B does not possess the inverse. 
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A.8 Prove (A.19) following the steps below. 

(a) Show that max,-jyj=1(v| |X| |ui) = (ui| |X| lui) for eigenvectors u; of |X| of 
length 1. 

(b) Show that max y.ynitary (Mil U X|ui) = (u;|UX X|u;) = (uj||X||ui), where Uy is 
given by using the polar decomposition X = Ux|X|. 

(c) Prove (A.19). 


A.9 Show (A.22). 


A.10 (Poincaré inequality) Let A be a d x d Hermitian matrix. Let a; be the 


eigenvalues of A ordered from largest to smallest. Show that on lAlx) < a 
xeEK, ||x|/= 


for any k-dimensional subspace K. 
(x|PAP|x) 


A.11 Showthat max min ———_—— = qa; under the same conditions as above. 
P:rankP=k x (x|P|x) 


A.12 Let A and B be Hermitian matrices, and let a; and b; be their ordered eigen- 
values from largest to smallest. Show that a; > b; if A > B. 


A.13 Assume that + - = 1 and X > 0. Show the following relations by using 
the matrix Hélder inequality (A.26) and the matrix reverse Hélder inequality (A.28) 


1 
pe algae et (A.31) 
1 
oe aa . A32 
zez=1 |X ||, for p < (A.32) 


A.4 Convex Functions and Matrix Convex Functions 


Linear functions are often used in linear algebra. On the other hand, functions such 
as x” and exp(x) do not satisfy the linearity property. If we denote such functions by 
Ff, then they instead satisfy 


f (Ax, + 1 = A)x2) S AFG) + 1 - AVF 2), OS VAS 1,¥x1,%2 ER. 


A function is called a convex function when it satisfies the above inequality. If — f 
is a convex function, then f is called a concave function. In the above, its domain 
is restricted to real numbers. However, this restriction is not necessary and may be 
defined in a more general way. For example, for a vector space, we may define the 
convex combination Av, + (1 — A)v2 for two vectors v; and v2 with 0 < \ < 1. 
More generally, a set is called a convex set when the convex combination of any 
two elements is defined. Further, a convex set L is called a convex cone if v € L 
and A > 0 imply Av € L. Therefore, it is possible to define convex and concave 
functions for functions with a vector space domain and areal number range. Similarly, 
convex and concave functions may be defined with a convex set domain. Examples 
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of convex sets are the set of probability distributions and the set of density matrices. 
In particular, an element v of the convex set V is called an extremal point if v; ¢ V 
and v = Av; + (1 — A)uv2, (O < A < 1) imply A = 1 or O. For example, a pure state 
is an extremal point in the set of density matrices. When the convex V is closed, 
any point v € V can be written as a convex combination of extremal points. Such an 
expression of v is called the extremal point decomposition of v. The extremal point 
decomposition characterizes the property of the point v. Further, for a given subset 
S of a convex set V, we can define the convex subset Co(S) of V as the set of convex 
combinations of elements of S. The convex subset Co($) is called the convex hull 
of S. 
Here, we prepare several important properties of convex functions. 


Lemma A.6 When a convex function f is defined on an open convex subset V of 
R¢. Then, f is continuous. 


Proof Fora point x of V, we choose d + | points y; in V such that x = ean a Yj. 


When a point z in the convex hull of {y;} is close to x, we can choose two positive 
numbers a; and az in (0, 1) that are close to 1 and non-negative numbers 5, ; and 
bo; such that z = ajx + >); bi iy; and x = anz + >); bo: y;. Then, we have f(z) < 
ay f (x) +d bis fv) and f(x) < anf (2) +X; boi fG). Thus, 


l-@ 


a2 


1 
fay—— Da bai f) < f@—f@)<@—-Df@+ De buf OD. 
(A.33) 


When z —> x, we have aj, a2 > | and b,;, bo; — 0. So, we obtain the continuity 
of f atx. a 


Lemma A.7 Let V be a compact convex subset set of R4 and f be a convex function 
f defined on the inner V of V. Assume that when a sequence of {Xn} in V converges 
to the boundary of V, the value f (xn) goes to +00. Then, the convex function f has 
the minimum. 


Proof Assume that the function f does not have the minimum. Since Lemma 
A.6 guarantees the continuity in V, there exists a sequence {x,} in V such that 
limy-soo f (Xn) = iInfyey f(x). Since V is compact, there exists a subsequence 
{Xn,} Converging in V. However, limg—+oo f (%n,) = +00, which contradicts the 


assumption. a 


Lemma A.8 Let f be a convex function on the convex set V. For any element vo of 
V, there exists a linear function g such that 


8(v0) — f (vo) = max g(v) — f(v). 
When f is differentiable, g coincides with the derivative of f at vo. Further, for any 


linear function g and a constant Cy > 0, there exists the Lagrange multiplier X such 
that 
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max f(v) + Ag(v) = max — f(v) + Ag(v). 


veV:ig(v)<Co 
In this case, Ag coincides with the derivative of f at argmax,<y-9(y) <c, f (v): 


Lemma A.9 ([4, Chap. VI Prop. 2.3]) Consider two vector spaces V; and Vz and 
consider a real-valued function f (v1, v2) with the domain V, x V>. If f is convex 
with respect to v2 and concave with respect to v1, then! 


sup min f(v;, v2) = min sup f (v4, v2), 
vj ES} v2E So v2E $2 vjES} 


where S; and S are convex subsets of V; and V3. 
Next, we focus on the set of probability distributions on S(#/) and denote it by 
P(S(H)). In particular, we consider extremal points of the set S(H): 


Erno. 


Such extremal points of the above set are characterized as follows. 


def 


P(p, S(H)) = {> € P(S(H)) 


Lemma A.10 (Fujiwara and Nagaoka [5]) Let p € P(p, S(H)) be an extremal point 
and {p\,..., Px} be the support of p. Then, p,,..., px are linearly independent. 
Hence, the number of supports of p is less than dim T (H) = (dim H)?. 


Note that we obtain the same result when we replace P(p, S(H)) by P(S(H)). 


Proof Assume that p,, ..., px are linearly dependent. That is, we choose real num- 
bers Ay,..., Ax such that ae Aipi = 0 and >’, A; = 0. Define two distributions 
q* and q~ with the same support by 


aq 
“4 


ie 


SS Pit €X;. (A.34) 


qi 


Then, we have p = $q* + $q~ and q* #q_. Itis a contradiction. a 


Indeed, applying this lemma to pmix, we can see that any extremal POVM has at most 
(dim H1)* elements. So, the set of extremal points are compact. Thus, we have the 
following lemma. 


Lemma A.11 A continuous convex function f for a POVM M has the minimum 
miny f(M). 


Further, we focus on the cost functions f),..., f; on S(#) and treat the following 
sets: 


! This relation holds even if V, is infinite dimensional, as long as Sz is a closed and bounded set. 
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def 


Paecle, f, SCA) = {» € P(p, S(H)) 


> pif (Pi) = (<)cVj = 1, | 


def 


Paoclf, SCH) = {> € P(S(H)) 


> pifj(ei) = (<)cVj = boot. 


Lemma A.12 (Fujiwara and Nagaoka [5]) Let p be an extremal point of one of the 
above sets. Then, the number of supports of p is less than (l + 1)(dim H)?. 


Using the convex function, we can show the following lemma. 
Lemma A.13_ When 0 < A < B, we have 
TrA* < TrB* (A.35) 
fors > 0. 


Proof Since the function x +> x* is matrix monotone with s € [0, 1], (A.35) holds 
in this case. Assume that s > 1. We make the diagonalizations A = Di aj\uj)(u;| 
and B = >°, bj\u;)(v|. Since 5°, Tr| uy) (u;|u;)(u;| = 1 and the function x b x° is 
convex, we have 


TrB? = D1 by Te] vy) (vi = D2 > bi Tel) (urls (us| 
1 7 4 


> > (Sorter tent tust) = > (TB u)1) 
j l j 


J 


>> (Teal) (ust) => («) =TrA’, 
j j 


Ss 


The concept of “convex function” can be extended to functions of matrices. If a 
function f with the range [0, 00] satisfies 


AF(A)+ A —HF(B) = FOA+ 1 — dB), 


for arbitrary Hermitian matrices A, B with eigenvalues in [0, oo], itis called a matrix 
convex function. See Sect. 1.5 for the definition of f(A). Also, the function f is 
called a matrix concave function when the function — f is a matrix convex function. 
The following equivalences are known for a function from (0, co) to (0, 00) [6]: 


@® f(t) is matrix monotone. 
@ t/f(t) is matrix monotone. 
@ f(t) is matrix concave. 
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Furthermore, it is known that if the function f satisfies one of the above conditions, 
1/f(t) is matrix convex [6]. Hence, since the functions ft’, —t~*(s € [0, 1]), and 
logt are matrix monotone, the functions ¢* (s € [—1, 0] U[1, 2]), —7° (s € [0, 1]), 
— log t, and f log t are matrix convex functions. The following theorem is known. 


Theorem A.1 ([2, 6]) The following conditions are equivalent for a function f. 


® f(t) is matrix convex on [0, oo). 

@ When a matrix Z satisfies Z* Z = I, any Hermitian matrix X with eigenvalues 
in [0, 00] satisfies f(Z* XZ) < Z* f (X)Z. 

@ When matrices Z,,...,Z, satisfy >.; Z}Z; = 1, any Hermitian matrices 
X1,...,Xx with eigenvalues in [0,00] satisfy f (>); Z7XiZi) < >; Z} 
f(X)Z. 


As its consequences, we have the following corollaries. 


Corollary A.1 f(t) is matrix convex on [0, co). Given a Hermitian matrix X on 
Ha ® He and a state po on Hp, we have f (TrpU ® po) X) < TrepU ® po) f (X). 


Proof of Corollary A.1 Assume @. Consider the spectral decomposition pp = 
>; pilui)(ui|. Choose the map Z; : |v)  ./pi|v) ® |u;). We have Trg (1 ® po) 
f(X) = > Z7 f(O)Z; andTrg (I @ po) X) = >); Z7X Z;. So, we obtain the desired 
argument. | 


Corollary A.2 Assume that f (t) is matrix convex on [0, oo) and that f (0) = 0 or 
lim;00 f(t) = 0, When a matrix C satisfies C*C < I, any Hermitian matrix A with 
eigenvalues in [0, co] satisfies f(C* AC) < C* f(A)C. 


Proof of Corollary A.2 Choose another matrix B := /I — C*C. When f (0) = 0, 
f(C*AC) = f(C*AC) + f(B*0B) < C* f(A)C + B* f(O)B = C* F(AJC. 
Similarly, when lim;. f(t) =0, f(C*AC)+ f(B*tB) < C* f(A)C + B* 
f(@)B for any positive real t > 0. Taking limit t + 00, we obtain the desired argu- 
ment. a 


Now, we focus on the equation; 


T co pe-! 
: = | dt (A.36) 
sin pt o Il+t 


for p € (0, 1). For each x > 0, substituting u/x into ¢ in the above, we obtain the 
decomposition of the matrix convex function —x? as 


sin 0° yP-! sin oo / yP-} P 
xP = pf a al (- ) aw. (A.37) 
T go Uutx T 0 u+x u+x 


Multiplying —x, we also have the decomposition of the matrix convex function x?~! 
as 
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: co yp 
ga eee i A, (A.38) 
0 


T u+x 


This relation shows that the function x* with s € [—1, 0] can be written as the positive 
sum of a family of matrix convex functions tas hast 


Next, we consider the matrix convex function x!+?. Since 
eae | (an (A.39) 
utx w+l ‘ w+tiloutx)’ : 
we obtain 
. i F 
oh SPT fh uP du PT Py us a 
x Jo uz+1 nr Jo w+iloutx 
pr sinpx [~*~ , Uu 1 
= cos — + ul _ du (A.40) 
2 wr Jo w+ilout+x 
because the relation 5 + ae = sin 2 Tein FF follows from (A.36) with replacing t and p 


by u? and 2 5, respectively. So, 


‘ [oe 
x! = cos Phx 4 APT uP a cee aii 
2 1 0 u~+tl u+x 


pT sinpa [© ux u 
= cos —x + uP 1+ du. (A.41) 
2 nc Jo u2+1 utx 


This expression shows the non-linear factor of the function x'*+? can be reduced to 
the functions {heo- 

As another example, matrix convex functions — log x and x log x can be decom- 
posed as 


| i ae Fe (A.42) 
—logx = — . 
g 0 x+t 1+tf 


a x x a t x 
xlogx = _ + — })dt= + —1)dt. (A.43) 
0 x+t 1+ 0 x+t I+t 


Generally, we have the following expression. 


Theorem A.2 ([7, Theorem 5.1], [6, Problem V.5.5]) Let f be a matrix convex 
function defined on (0, 00). There exists a positive measure 1 on [0, 00) such that 


/ 2 = ie 
f@) = f+ fx + b@ — 1) +/ ———— H(du) 
0 u+x 


= FQ) + f(y +b =D? + | (x-a+ ae aay, oe 
0 
(A.44) 
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where b = lim,-,o fo) > 0. When f is a sub-linear, i.e., f(x)/x > 0 as x —> ov, 


there exists a positive measure jt on [0, oo) such that 


ae 1 
fo = F0)+ f (— - 5) maw, (A.45) 


Let f be a matrix convex function defined on [0, 00). There exist a constant a and 
a positive measure p on (0, oo) such that 


2 


u(du) 


f= FO) bax tbs? + | 
o Uutx 


u2 


u+x 


= f(0) tax + bx? + [ (: =e ) (du), (A.46) 


where b = lim,_,9 Lo) > 0. When f is a sub-linear, There exist a constant a and a 


positive measure jt on (0, 00) such that 


CO 
f(x) = Ff) +f (— = «) (du). (A.47) 
0 u+x u 

Therefore, the non-linear factor of a matrix convex function can be reduced to the 
functions {sah >o and x?. That is, the set of matrix convex functions defined on 
(0, co) forms a convex set, and its extremal points are given as the functions {cy }aso 
and x”. Theorem A.2 contains the extremal point decompositions in four types of 
matrix convex functions. In particular, the two functions 1 and x? play a special role. 
The sub-linearity corresponds to the absence of the factor x”, and the extendability 
of the domain to x = 0 does to the absence of the factor 4. 


Remark A.1 Nevertheless a matrix convex function is very important mathematical 
object, no textbook cover it including the extremal point decomposition perfectly. 
As shown in (A.44) and (A.46), the extremal point decomposition depends on the 
domain. The paper [7, Theorem 5.1] gives the extremal point decomposition (A.44). 
The book [6, Problem V.5.5] gives (A.46) when the derivative f’(0) exists. However, 
the derivative f’(0) does not exist in general [8]. The current form (A.46) was 
obtained by Hiai [8]. When we impose the sub-linearity, the coefficients of x and 
x* vanish. So, we have the extremal point decomposition (A.45) and (A.47). When 
the domain is [—1, 1], we have another type of extremal point decomposition [2, 
Theorem 4.40], [9, Theorem 2.7.6]. 


Exercises 


A.14 Show that an extremal point decomposition of an arbitrary density matrix p 
is not unique in the set of density matrices when p is not a pure state. That is, give at 
least two extremal point decomposition of a density matrix p. 


A.15 Show the concavity of the von Neumann entropy (5.77) using the matrix 
convexity of x log x. 
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A.16 Show that the inequality é(s|p||o) => o(s|K(p)||K(c)) does not hold in general 
with the parameter s € (—co, —1). 


A.5 Solutions of Exercises 


Exercise A. 1 Choose a basis {|e;)} of the subspace spanned by |v), ..., |v). Define 
the matrix A = (aj, ;) by |ut;) — De, di, \@;)- Then, J. = (uj |Uu;) — Sy Gi, KA j,k- 
That is, J = AA‘, whichimplies J~! = (A~!)*A7!. Since J = J‘, we have J~! = 
(J~—!)'. Thus, (A.12). Hence, 


DIT wi) wil = DS DAD (An Trai a lee) (er 
ij 


i,j k,ln 


= Do bnt5n ele) (el = >- len) (nl: 


k,l,n n 


Exercise A.2 Using the polar decomposition A = U|A|, we have 
AA* f(AA*) = U|A)'U* f(U|APU*) = U|APU*UF (AP)U* 
=U|AP f (IA))U* = UIA f IAP IAIU* = Af (A*A)A*. 


Exercise A.3 The eigen equation is (—a — x)(a — x) |b|* = 0, which is equiva- 
lent to x* = |b|* + a’. Since the eigen values are +2,/|b|? + a2. So, the trace of 


(F °) is equal to 2,/|b|? + a?. 


Exercise A.4 Using the definition (A.18), we have 


|X|], = max TrU,gX > max TrU, & [pX 
Vas Ua 


=max TrU4TrpX = ||TrgX|I1. 
A 


Exercise A.5 Since BA = A~'ABA, we have (BA—xI) = A7!(AB—xI)A. 
Hence, the kernel of (BA — xJ) has the same dimension as that of (AB — x/). Thus, 
the eigenvalues of BA are the same as the eigenvalues of A B including degeneracies. 


Exercise A.6 If A possesses the inverse, Exercise A.5 yields the desired argument. If 
A does not possess the inverse, we choose an invertible matrix A, approximating A. 
Since the eigenvalue is a continuous function of a matrix, we have lim,_,9 spr(A,B) = 
spr(A B) and lim,_,9 spr(B A.) = spr(BA), which implies the desired argument. 


Exercise A.7 


(a) Since (1.34) implies that 
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I1=B 1/2 BB 1/2 > B 1/24 B 1/2 = (A? Baa, 


(A.17) yields that 


JAB | = IV (AB) (Al B-/)I] < |||] = 1. 
(b) Exercise A.6 and (A.16) yield that 
spr(B7!/4 Al/? B-1/4) = spr(Al/? B-1/?) < Al? Bol? < 1. 


(c) The relation 1 < spr(B~!/4A!/? B-'/4) implies that B~!/4 A!/? B-'/4 < J. Hence, 
we have Al/2 = B'/4B-1/4 41/2 B-1/4 Bl < BABA —_ B'/2. 

(d) Since B + «J > A, we have Al? < (B+ €/)'/?. Taking the limit « > 0, we 
have A!/2 < B}/2, 


Exercise A.8 


(a) Let A; =O be the eigenvalue of |X| associated with u;. Then, we have 
MaXy-jyj—1(V| |X| |ui) = Maxy-yypar Av (viui) = Aj (ui lui) = (ui| |X| ui). 
(b) We have 


max (u;|UX|u;)= max (u;|UUx|X]||u;) 


u:unitary U:unitary 
= max (v| |X| |ui) = (ui||X||ui). 
v:|lu||=1 


Since the above maximum can be realized with U = iu. we have (u;||X||u;) = 
(u;|Ux X |ui). 
(c) We have 


max TrUxX = max (u;|UX|u;) 
u:unitary ~ u:unitary 


=>" willX lui) = Te]X| = TUR. 


Exercise A.9 We choose the unitary Uy by using the polar decomposition X = 
Ux|X|, and the basis |u;) as the eigenvectors of | A|. Then, 


WX, 2 max TrUYX= max TrUYU‘UUy|X| 
u:unitary u:unitary 


= max (uj|UYU'UUx|X||u;) 
; U:unitary 

(b) + 

<>) max ||UYUTUUs||_ max (u||X||u;) 
; U:unitary u:|lul|=1 


©) t @) 
=>) max |UYUTUU || (wil X lui) < WY WIX IL, 
;_ U:unitary 
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where (a), (b), (c), and (d) follow from (A.18), (A.14), (a) of Exercise A.8, and 
(A.15), respectively. 


Exercise A.10 Let K’’ be the (d — k + 1)-dimensional subspace spanned by the eigen- 
vectors corresponding to the eigenvalues a;,..., aq. Then, for any k-dimensional 
subspace K, the intersection space KM K’ has at least dimension 1. So, 


min (x|A|x) < 


) min (x|A|x) < ay. 
x€K,||x||=1 xEKNK’:||x||=1 


Exercise A.11 When KX is the image of k-dimensional projection P, Exercise A.10 
yields that 


(x|PAP|x) 
min ——————- = 


min (x|A|x) < ax. 
x (x|P |x) xeX, ||x||=1 


Taking the maximum, we obtain 


_ (x|PAP|x) 
max min —————— <q. 
P:rankP=k x (x|P|x) 


The equality holds when P is the projection spanned by the eigenvectors correspond- 
ing to the eigenvalues a), ..., ad. 


Exercise A.12 Exercise A.11 yields that 


. (x|PAP|x) . (x|PBP|x) 
a; = max min ————— > max min ———— = dx. 
P:rankP=k x (x|P|x) P:rankP=k x (x|P|x) 


Exercise A.13 The matrix Hdlder inequality (A.26) yields that 


i 1 
TrZ7X < ||Z4 MI XIlp = IXllp 


for p > 1 anda matrix Z > 0 satisfying TrZ = 1. When Z = ~s , we have 
1 XP 1 
TrZ¢ X = Tr ——_- = (TrX”)?, (A.48) 
(TrX?)4 


which implies (A.31). 
The matrix reverse Hélder inequality (A.28) yields that 


A 1 
TrZ9X > |Z |IqlXIlp = IXllp 


for p < | anda matrix Z > O satisfying TrZ = 1. When Z = “~s we have (A.48), 
which implies (A.32). 
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Exercise A.14 We make a spectral decomposition of p as psa A; |ui)(u;|, which is 
an extremal point decomposition of the density matrix p. To make another extremal 
point decomposition of the density matrix p, we assume that \;, A2 > 0 without loss 
of generality. So, using |u+) := (lui) + |u2)), we have 


Ay |u1) (ur | + Az|u2)(u2| = (Ar — Az) |r) (ua |A2 (ui) (ur | + [u2) (v2) 
=(Aq — Az) |u1) (ur |A2 (lu) (u4| + [u_) (u_ I). 


So, (Ay — An) [wi )(ui|Ar(ug) (up| + [u—) (W_1) + Dig Aslui)(uil is another 
extremal point decomposition of the density matrix p. 


Exercise A.15 Since x log x is matrix convex, two density matrices p; and p2 and a 


real number p € (0, 1) satisfy 


(pp: + CU — p)p2) log pp + (1 — p)p2 S ppi log p; + UA — p)pz log pz. 


Taking the trace, we obtain the concavity of the von Neumann entropy (5.77). 


Exercise A.16 We show the desired argument by contradiction. Consider that 
the states p’ := ppr @|1)(1| + (1 — p)p2 ® [2)(2|, 0" := po) (|1)(I] + UL — p)o ® 
|2)(2| with an arbitrary state o. 

Apply the assumption of contradiction to the partial trace. Then, 


Tr((ppi + (1 — p)p2)'*o°) < pTr(py 0°) + (1 — p)Tr(p3 *0°). 


Since c is arbitrary, the above inequality is equivalent with (pp; + (1 — p)p2)!~* < 
pp,’ +(1— p)py*, which implies the matrix convexity of the map x +> x!-°. 
Since this matrix convexity holds only for s € [0, 1], we obtain the contradiction. 
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My research on quantum information theory started in October of 1994, when I was 
a first year master’s student. At that time, although Shor’s paper on factorization had 
already been published, I was still unaware of his work. Nor was the field of quantum 
information theory very well known. What follows is a brief summary of how I got 
started in the field of quantum information theory. This is merely a personal account 
of my experiences, but I hope that my story will help those considering embarking 
on graduate or postgraduate studies and pursuing a career in research. 

I began my university studies at Kyoto University studying both mathematics and 
physics, thanks to the university’s policy of allowing students to graduate without 
choosing a major. In my case, I was mainly interested in physics, and I decided to 
study both physics and mathematics because I was not entirely comfortable with the 
type of thinking found in physics; I was more naturally inclined toward mathemat- 
ics. As a result, during my undergraduate years, on the one hand, I had a reasonable 
understanding of mathematics; on the other hand, I could not understand physics 
sufficiently. More seriously, I could not grasp the essence of statistical mechanics, in 
which “physics thinking” appears most prominently. In my fourth year of undergrad- 
uate studies, I noticed that, based on my understanding of physics, I probably would 
not pass the entrance exams for graduate course in physics. Therefore, I decided to 
apply to a graduate program in mathematics (into which I was just barely accepted). 
In particular, while I settled on the early universe in the cosmology group as the main 
focus of research in my undergraduate studies, its outcome was rather hopeless due 
to my poor knowledge of statistical mechanics. In fact, when I told a professor of 
physics that I would work the next year as a tutor to help high school students cram 
for their physics exams, he told me, “I would never let you teach physics to anyone.” 
I managed to graduate, but I could not assimilate physics. 

The following April I began my graduate studies in twistor [10] theory! under 
Professor Ueno, a professor of mathematics at Kyoto University. I chose this topic 


‘Professor Richard Jozsa also studied twistor theory as a graduate student. 
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because it is related to relativity theory, which I was interested in at that time. How- 
ever, as is the case with many topics in mathematical physics, it is rooted in physics, 
but it was essentially mathematical. I also realized how difficult it was to understand 
the physics behind the mathematical concepts. Ultimately, I realized that it did not 
suit my interests. Although I was capable of thinking in a mathematical way, I was 
not interested in mathematics itself. Therefore, I could not focus on pure mathemat- 
ics and started to search for another research topic. Meanwhile, teaching high school 
physics as a tutor to help students cram for exams school during my graduate years 
led me to the conviction that for the first time I truly understood physics. Until then, I 
was enslaved by difficult mathematical structures in physics. At this time, I realized, 
how important it was to understand physics based on fundamental concepts. 

While searching for a new research topic, I met Dr. Akio Fujiwara, who came to 
Osaka University as an assistant professor. He advised me to study Holevo’s textbook 
[11], and I decided that I would start research in quantum information theory. Up until 
that time, I had mainly studied abstract mathematics with little connection to physics. 
I was particularly impressed with the quantum-mechanical concepts described by 
Holevo’s textbook without high levels of abstraction. Although Holevo’s textbook 
was not an easy book to read from the current viewpoint, it was not very difficult 
for me because I had read more difficult books on mathematics. In retrospect, it 
might be fortunate that I did not proceed to a graduate course in physics because the 
physics community had an implicit, unwritten rule never to attempt the measurement 
problem in quantum mechanics due to its philosophical aspect in Japan. Therefore, 
while I appeared to take a rather indirect path during my years in undergraduate 
and graduate courses, my career may have been the most direct path. However, 
I faced a problem upon starting my research. Since I had only studied physics and 
mathematics until that point, I was wholly ignorant of subjects in information science 
such as mathematical statistics. In particular, despite having had the opportunity to 
study these subjects, I had not taken the opportunity. During my undergraduate years, 
compared with physics, which examines the true nature of reality, I regarded statistics 
to be a rather lightweight subject. I considered statistics as only a convenient subject, 
not an essential one. This perception changed as a result of reading Holevo’s text. The 
reason is that it is impossible to quantitatively evaluate the information obtained by an 
observer without a statistical viewpoint because the measurement data are inherently 
probabilistic under the mathematical formulation of quantum mechanics. Ultimately, 
I was forced to study subjects such as mathematical statistics and information theory, 
which should be studied in an undergraduate program. In the end, the research for my 
master’s thesis would be completed with an insufficient knowledge of mathematical 
statistics. 

Further, as another problem, I had no colleagues in this research area that I 
could discuss my research with. Hence, I had to arrange opportunities to discuss 
my work with researchers at distant locations. Moreover, since I was also finan- 
cially quite unstable during the first half of my doctorate program, I was dividing my 
research time back then between casual teaching work in high school and at an exam- 
preparation school. In particular, in the first six months of my doctoral program, my 
research progress was rather slow due to a lack of opportunities to discuss my research 
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interests. Then, the Quantum Computation Society in Kansai opened in November 
1996, and it gave me the chance to talk about topics closely related to my interests. 
As aresult, I could continue my research. During this period, I also had many helpful 
discussions via telephone with Keiji Matsumoto, who was a research associate at the 
University of Tokyo. Thus, I was able to learn statistics, and I am deeply indebted 
to him. I am also grateful to Professor Kenji Ueno, who accepted me as a graduate 
student until my employment at RIKEN. 

In less than 10 years, the situation in Japan with quantum information theory 
has changed completely. What follows are my thoughts and opinions on the future 
of quantum information theory. Recently, sophisticated quantum operations have 
become a reality, and some quantum protocols have been implemented. I believe 
that it is necessary to propose protocols that are relatively easy to implement. This 
is important not only to motivate further research, but also to have some feedback 
on the foundations of physics. I believe that the techniques developed in information 
theory via quantum information theory will be useful to the foundations of physics. 

Thanks to the efforts of many researchers, the field of quantum information theory 
is now well known. But I feel that many universities in Japan have trouble internaliz- 
ing quantum information theory in the current organization of disciplines. Scientific 
study should have no boundaries among the different fields of knowledge. Hence, I 
take as my point of departure the assumption that it is possible to create a more con- 
structive research and educational environment through the treatment of fields such 
as quantum information theory that transcend the current framework of disciplines. 

My hope is that this book will introduce to quantum information theory people 
dissatisfied with the existing framework of science as it is currently practiced. 
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